logo

Bayesian Data Analytics (pdf)

14 Pages4155 Words383 Views
   

Added on  2021-07-09

Bayesian Data Analytics (pdf)

   Added on 2021-07-09

ShareRelated Documents
Bayesian coursework specification for 2021 Data Analytics ECS648U/ ECS784U/ ECS784P Revised on 25/02/2021 by Dr Anthony Constantinou and Dr Neville Kenneth Kitson. 1.Important Dates Release date: Thursday 25thFebruary2021 at 10:00 AM. Submission deadline: Wednesday, 28thApril 2021 at 10:00 AM. Late submission deadline (cumulative penalty applies): Within 7 days after deadline. General information: i.When submitting coursework online you receive an automated e-mail as proof of submission. Turnitin receipt does not constitute proof of submission. Some students will sometimes upload their coursework and not hit the submit button. Make sure you fully complete the submission process. ii.A penalty will be applied automatically by the system for late submissions. a.Your lecturer cannot remove the penalty! b.Penalties can only be challenged via submission of an Extenuating Circumstances (EC) form which can be found on your Student Support page.All the information you need to know is on that page; including how to submit an EC claim along with the deadline dates and full guidelines. c.If you submit an EC form, your case will be reviewed by a panel and the panel will make a decision on the penalty and inform the Module Organiser. iii.If you miss both the submission deadline and the late submission deadline, you will automatically receive a score of 0. Extensions can only be granted through approval of an EC claim. iv.Submissions via e-mail are not accepted. v.It is recommended by the School that we set the deadline at 10:00 AM. Do not wait until the very last moment to submit the coursework. vi.Your submission should be a single PDF file. vii.For more details on submission regulations, please refer to your relevant handbook.
Bayesian Data Analytics (pdf)_1
2.Coursework overview The coursework is based on the Bayesian material and must be completed individually (group submissions will not be accepted). To complete the coursework, follow the tasks below and answer ALL questions enumerated in Section 3. It is recommended that you read the full document beforeyou start completing the tasks enumerated below. What follows has been tested on Windows and MAC operating systems. There is a compatibility issue with MAC OS (and likely to extend to Linux) which is covered in the Bayesys manual (details below), but which does not influence the coursework submission requirements. Task 1: Set up and reading a)Visit http://bayesian-ai.eecs.qmul.ac.uk/bayesys/b)Download the Bayesys user manual. c)Set up the project by following the steps in Section 1 of the manual. d)Read Section 2 of the manual. e)Read Section 3. f)Read Section 4. g)SkipSection 5. h)Read Section 6 and repeat the example. i.MAC and Linux users will not be able to view the PDF graphs shown in Fig 6.1; i.e., the compatibility issue involves the PDF file generator. ii.Skip subsections 6.3, 6.3.1, and 6.4. i)SkipSection 7. j)SkipSection 8. k)Read Section 9. l)Skipthe appendices.
Bayesian Data Analytics (pdf)_2
You should address a data-related problem in your professional field or a field you are interested in. If you are motivated in the subject matter the project will be more fun for you and you will likely produce a better report. Section 5 provides a list of data sources you could consider. You are allowed to reuse the dataset you prepared during the Python coursework, as long as a) your Python coursework submission was NOTa group submission, and b) you consider the dataset to be suitable for Bayesian network structure learning (refer to Q1 in Section 3). Lastly, you are not allowed to reuse datasets from the Bayesys repository for this coursework. Task 3: Prepare your dataset for structure learning a)The Bayesys structure learning system assumes the input data are discrete; e.g., low/medium/high or Yellow/Blue/Green, rather than a continuous range of numbers. If you have a continuous variable in your dataset with integers ranging, for example, from 1 to 100, the algorithm will assume that this variable has 100 different states (and many more if the values are not integer). This will make the dimensionality of the model unmanageable, leading to poor accuracy and high runtime; if this is not clear why, refer to the Conditional Probability Tables (CPTs) in the lecture slides and relevant book material. You should discretise continuous variables to reduce the number of states to reasonable levels. For example, you could discretise the variable discussed above, with values ranging from 1 to 100, into the five states {“1to20”, “21to40”, “41to60”, 61to80”, “81to100”}. If a continuous variable incorporates a small number of different values (e.g., less than 10), it may not need discretisation. It is up to you to determine whether a variable requires discretisation, as well as the level of discretisation. You are free to follow any approach you wish to discretise the variable, including discretising the variables manually as discussed in the above example. The structure learning accuracy is not expected to be strongly influenced as long as the dimensionality of the data is reasonable with respect to its
Bayesian Data Analytics (pdf)_3
Task 3: Draw out your knowledge-based graph a)Use your knowledge to produce a knowledge causal graph given the variables in your dataset. You may find it easier if you start drawing the graph by hand. b)Record this knowledge in a CSV file following the format of DAGtrue.csvas depicted in the Bayesys manual. For an example file, refer to file DAGtrue_ASIA.csvin project directory Sample input files/Structure learning. c)Rename your knowledge graph file DAGtrue.csvand place it in folder Input. d)Make another copy of the above file, rename it DAGlearned.csvand place it in folder Output. e)Run the Bayesys NetBeans project and make sure your dataset is in folder Inputand named trainingData.csv (as per Task 2c). Under tab Main, select Evaluate graphand the subprocess Generate DAGlearned.PDF. Then hit Run. i.The system will generate the file DAGlearned.pdfin folder Output. This is your knowledge graph drawn by the system. If you are working on MAC/Linux OS, the DAGlearned.pdffile is likely to be corrupted. If it is, you can use an online Graphviz editor such as the one available here: https://edotor.net/ . The Graphviz editor turns a textual representation of a graph into a visual drawing. Use the code shown below, as an example, and edit the code accordingly to be consistent with your DAGtrue.csv; e.g., the relationships can be taken directly from the CSV file. The graph should update instantly as you edit the code. digraph { Earthquake -> Alarm Burglar -> Alarm Alarm -> Call }
Bayesian Data Analytics (pdf)_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Bayesian Data Analytics (pdf)
|14
|4150
|209

Southern Cross University (SCU) Assignment 2022
|8
|2061
|20