1 Bayesiancoursework specificationfor2021 Data Analytics ECS648U/ ECS784U/ ECS784P Revisedon25/02/2021byDrAnthony Constantinouand Dr Neville Kenneth Kitson. 1.Important Dates •Release date:Thursday25thFebruary2021at10:00 AM. •Submission deadline:Wednesday,28thApril2021at10:00 AM. •Late submission deadline(cumulativepenalty applies): Within 7 days after deadline. General information: i.When submitting coursework online you receive an automated e-mail as proof of submission.Turnitin receipt does not constitute proof of submission.Some students will sometimes upload their coursework and not hit the submit button. Make sure you fullycomplete the submission process. ii.A penalty will be applied automaticallyby thesystemfor late submissions. a.Your lecturer cannot remove the penalty! b.PenaltiescanonlybechallengedviasubmissionofanExtenuating Circumstances (EC) formwhich can be foundon your Student Support page. All the informationyouneed to know is on that page; including how to submit an EC claim along with the deadline dates and full guidelines. c.If you submit an EC form, yourcasewill be reviewed by a panel andthe panel willmake a decision on the penaltyand inform theModuleOrganiser. iii.If you missboth the submission deadline and the late submission deadline,you will automaticallyreceive a score of 0.Extensions can only be granted through approval of an ECclaim. iv.Submissions via e-mailare notaccepted. v.It is recommended by the School that we set the deadline at10:00 AM.Do notwait until the very lastmomentto submit thecoursework. vi.Your submission should be a single PDFfile. vii.For more detailson submission regulations,please refer to your relevant handbook.
2 2.Coursework overview •The courseworkis based on theBayesian materialand must becompleted individually (group submissions will not be accepted). •To complete thecoursework,follow thetasksbelow andanswer ALL questions enumeratedin Section 3.It is recommended that you readthe full documentbeforeyou start completing the tasks enumerated below. •What follows has been tested on Windows and MAC operating systems. There is a compatibility issue with MAC OS (and likely to extend to Linux) which is covered in the Bayesys manual (details below), but which does not influence the coursework submission requirements. Task1: Set up and reading a)Visithttp://bayesian-ai.eecs.qmul.ac.uk/bayesys/ b)Download the Bayesys user manual. c)Set up the project by following the steps in Section 1 of themanual. d)Read Section 2 of the manual. e)Read Section 3. f)Read Section 4. g)SkipSection 5. h)Read Section 6 and repeat the example. i.MAC and Linux users will not be able to view the PDF graphs shown in Fig 6.1; i.e., the compatibility issue involves the PDF file generator. ii.Skip subsections 6.3, 6.3.1, and 6.4. i)SkipSection 7. j)SkipSection 8. k)Read Section 9. l)Skipthe appendices. Task2: Determine research area and collate data You are free to choose or collate your own dataset.You should also determine the dataset size, both in terms of the number of variables and the sample size, relevant to the problem you are analysing. Some areas might require more data than others,and it is up to you to make this decision.
3 You should address a data-related problem in your professional field or a field you are interested in. If you are motivated in the subject matter the project will be more fun for you and you will likely produce a better report.Section5provides a list of data sources you could consider. You are allowed to reuse the dataset you prepared during the Python coursework, as long as a) your Python coursework submission wasNOTa group submission, and b)youconsider the dataset to be suitable for Bayesian network structure learning (refer to Q1 in Section 3). Lastly,you are not allowed to reuse datasets from the Bayesysrepositoryfor this coursework. Task3: Prepare your dataset for structure learning a)The Bayesys structure learning system assumes the input data are discrete; e.g., low/medium/high or Yellow/Blue/Green, rather than a continuous range of numbers. If you have a continuous variable in your dataset withintegersranging, for example, from 1 to 100, the algorithm will assume that this variable has 100 different states (and many more if the values are not integer). This will make the dimensionality of the model unmanageable, leading to poor accuracy and high runtime;if this is not clear why, refer to the Conditional Probability Tables (CPTs) in the lecture slides and relevant bookmaterial. You should discretise continuous variables to reduce the numberof statesto reasonable levels. For example, you could discretise the variable discussed above, with values ranging from 1 to 100,into the five states {“1to20”,“21to40”,“41to60”, “61to80”,“81to100”}. If a continuous variable incorporates a small number of different values (e.g., less than 10), it may not need discretisation. It is up to you to determinewhether a variable requires discretisation, as well asthe level of discretisation.You are free to follow any approach you wish to discretise the variable,includingdiscretising the variablesmanually as discussed in the above example. The structure learningaccuracy is not expected to be strongly influenced as long as thedimensionalityof the data isreasonablewith respect to its sample size. b)Your dataset must not have missing values (i.e., empty cells). Replace ALL empty cells with the value ‘missing’ (or use a different relevant name). This forces the algorithm to consider all missing values as an additional state. If missing data follows a pattern, this may or may not help the algorithm to produce a more accurate graph. c)Rename your dataset totrainingData.csvand place it in folderInput.
4 Task3:Draw outyour knowledge-based graph a)Use your knowledge to produce a knowledge causalgraph given the variables in your dataset. You may find it easier if you start drawing the graph by hand. b)Record this knowledge in a CSV file following the format ofDAGtrue.csvas depictedin theBayesysmanual.For an example file, refer tofileDAGtrue_ASIA.csv inproject directorySample input files/Structure learning. c)Rename your knowledge graphfileDAGtrue.csvand place it in folderInput. d)Make another copy of the above file, rename itDAGlearned.csvandplace it in folder Output. e)Run the Bayesys NetBeans project and make sure your dataset is in folderInputand namedtrainingData.csv(as per Task 2c). Under tabMain, selectEvaluate graphand the subprocessGenerate DAGlearned.PDF. Then hitRun. i.The system will generate the fileDAGlearned.pdfin folderOutput. This is your knowledge graph drawn by the system. If you are working on MAC/Linux OS, theDAGlearned.pdffile is likely to be corrupted. If it is, you can use an online Graphviz editor such as the one available here:https://edotor.net/.The Graphviz editor turns a textual representation of a graph into a visual drawing.Use the code shown below, as an example, and edit the code accordingly to be consistent with your DAGtrue.csv; e.g., the relationships can be taken directly from the CSV file. The graph should update instantly as you edit the code. digraph{ Earthquake->Alarm Burglar->Alarm Alarm->Call } ii.This step also generates some information in the terminal window of NetBeans. Save thelast three linesas you will need them in answering some of the questions in Section 3; i.e., theline outputs involvingLog-Likelihood (LL) score, BIC score and the # of free parameters.
End of preview
Want to access all the pages? Upload your documents or become a member.
Related Documents
Bayesian Data Analytics (pdf)lg...
|14
|4155
|383
Southern Cross University (SCU) Assignment 2022lg...