Bayesian Data Analytics (doc)

Added on - 14 Jul 2021

  • 14

    Pages

  • 4155

    Words

  • 41

    Views

  • 0

    Downloads

Trusted by +2 million users,
1000+ happy students everyday
Showing pages 1 to 4 of 14 pages
Bayesian coursework specification for 2021Data Analytics ECS648U/ ECS784U/ ECS784PRevised on 25/02/2021 by Dr Anthony Constantinou and Dr Neville Kenneth Kitson.1.Important DatesRelease date: Thursday25thFebruary2021 at 10:00 AM.Submission deadline: Wednesday,28thApril2021 at 10:00 AM.Late submission deadline (cumulative penalty applies): Within 7 days after deadline.General information:i.When submitting coursework online you receive an automated e-mail as proof ofsubmission. Turnitin receipt does not constitute proof of submission. Some studentswill sometimes upload their coursework and not hit the submit button. Make sure youfully complete the submission process.ii.A penalty will be applied automatically by the system for late submissions.a.Your lecturer cannot remove the penalty!b.PenaltiescanonlybechallengedviasubmissionofanExtenuatingCircumstances (EC) form which can be found on your Student Support page.All the information you need to know is on that page; including how to submitan EC claim along with the deadline dates and full guidelines.c.If you submit an EC form, your case will be reviewed by a panel and the panelwill make a decision on the penalty and inform the Module Organiser.iii.If you miss both the submission deadline and the late submission deadline, you willautomatically receive a score of 0. Extensions can only be granted through approvalof an EC claim.iv.Submissions via e-mail are not accepted.v.It is recommended by the School that we set the deadline at 10:00 AM. Do not waituntil the very last moment to submit the coursework.vi.Your submission should be a single PDF file.vii.For more details on submission regulations, please refer to your relevant handbook.
2.Coursework overviewThe coursework is based on the Bayesian material and must be completed individually(group submissions will not be accepted).To complete the coursework, follow the tasks below and answer ALL questionsenumerated in Section 3. It is recommended that you read the full documentbeforeyoustart completing the tasks enumerated below.What follows has been tested on Windows and MAC operating systems. There is acompatibility issue with MAC OS (and likely to extend to Linux) which is covered inthe Bayesys manual (details below), but which does not influence the courseworksubmission requirements.Task 1: Set up and readinga)Visithttp://bayesian-ai.eecs.qmul.ac.uk/bayesys/b)Download the Bayesys user manual.c)Set up the project by following the steps in Section 1 of the manual.d)Read Section 2 of the manual.e)Read Section 3.f)Read Section 4.g)SkipSection 5.h)Read Section 6 and repeat the example.i.MAC and Linux users will not be able to view the PDF graphs shown in Fig6.1; i.e., the compatibility issue involves the PDF file generator.ii.Skip subsections 6.3, 6.3.1, and 6.4.i)SkipSection 7.j)SkipSection 8.k)Read Section 9.l)Skipthe appendices.
You should address a data-related problem in your professional field or a field you areinterested in. If you are motivated in the subject matter the project will be more fun for youand you will likely produce a better report. Section 5 provides a list of data sources you couldconsider.You are allowed to reuse the dataset you prepared during the Python coursework, as long asa) your Python coursework submission wasNOTa group submission, and b) you considerthe dataset to be suitable for Bayesian network structure learning (refer to Q1 in Section 3).Lastly,youarenotallowedtoreusedatasetsfromtheBayesysrepositoryforthiscoursework.Task 3: Prepare your dataset for structure learninga)The Bayesys structure learning system assumes the input data are discrete; e.g.,low/medium/high or Yellow/Blue/Green, rather than a continuous range of numbers.If you have a continuous variable in your dataset with integers ranging, for example,from 1 to 100, the algorithm will assume that this variable has 100 different states(and many more if the values are not integer). This will make the dimensionality ofthe model unmanageable, leading to poor accuracy and high runtime; if this is notclear why, refer to the Conditional Probability Tables (CPTs) in the lecture slides andrelevant book material.You should discretise continuous variables to reduce the number of states toreasonable levels. For example, you could discretise the variable discussed above,with values ranging from 1 to 100, into the five states {“1to20”, “21to40”, “41to60”,61to80”, “81to100”}. If a continuous variable incorporates a small number ofdifferent values (e.g., less than 10), it may not need discretisation.It is up to you to determine whether a variable requires discretisation, as wellas the level of discretisation. You are free to follow any approach you wish todiscretise the variable, including discretising the variables manually as discussed inthe above example. The structure learning accuracy is not expected to be stronglyinfluenced as long as the dimensionality of the data is reasonable with respect to its
Task 3: Draw out your knowledge-based grapha)Use your knowledge to produce a knowledge causal graph given the variables in yourdataset. You may find it easier if you start drawing the graph by hand.b)Record this knowledge in a CSV file following the format ofDAGtrue.csvasdepicted in the Bayesys manual. For an example file, refer to fileDAGtrue_ASIA.csvin project directorySample input files/Structure learning.c)Rename your knowledge graph fileDAGtrue.csvand place it in folderInput.d)Make another copy of the above file, rename itDAGlearned.csvand place it in folderOutput.e)Run the Bayesys NetBeans project and make sure your dataset is in folderInputandnamedtrainingData.csv(as per Task 2c). Under tabMain, selectEvaluate graphandthe subprocessGenerate DAGlearned.PDF. Then hitRun.i.The system will generate the fileDAGlearned.pdfin folderOutput. This isyour knowledge graph drawn by the system.If you are working on MAC/Linux OS, theDAGlearned.pdffile islikely to be corrupted. If it is, you can use an online Graphviz editor such asthe one available here:https://edotor.net/. The Graphviz editor turns a textualrepresentation of a graph into a visual drawing. Use the code shown below,as an example, and edit the code accordingly to be consistent with yourDAGtrue.csv; e.g., the relationships can be taken directly from the CSV file.The graph should update instantly as you edit the code.digraph{Earthquake->AlarmBurglar->AlarmAlarm->Call}
desklib-logo
You’re reading a preview
Preview Documents

To View Complete Document

Click the button to download
Subscribe to our plans

Download This Document