Limited-time offer! Save up to 50% Off | Solutions starting at $6 each  

Bayesian Data Analytics (pdf)

Added on - 21 Jan 2022

Trusted by 2+ million users,
1000+ happy students everyday
Showing pages 1 to 4 of 14 pages
Bayesiancoursework specificationfor2021

Data Analytics ECS648U/ ECS784U/ ECS784P

Revisedon25/02/2021byDrAnthony Constantinouand Dr Neville Kenneth Kitson.

1.Important Dates

Release date:Thursday25thFebruary2021at10:00 AM.

Submission deadline:Wednesday,28thApril2021at10:00 AM.

Late submission deadline(cumulativepenalty applies): Within 7 days after deadline.

General information:

i.When submitting coursework online you receive an automated e-mail as proof of
submission.Turnitin receipt does not constitute proof of submission.Some students
will sometimes upload their coursework and not hit the submit button. Make sure you
fullycomplete the submission process.

ii.A penalty will be applied automaticallyby thesystemfor late submissions.

a.Your lecturer cannot remove the penalty!

Circumstances (EC) formwhich can be foundon your Student Support page.
All the informationyouneed to know is on that page; including how to submit
an EC claim along with the deadline dates and full guidelines.

c.If you submit an EC form, yourcasewill be reviewed by a panel andthe panel
willmake a decision on the penaltyand inform theModuleOrganiser.

iii.If you missboth the submission deadline and the late submission deadline,you will
automaticallyreceive a score of 0.Extensions can only be granted through approval
of an ECclaim.

iv.Submissions via e-mailare notaccepted.

v.It is recommended by the School that we set the deadline at10:00 AM.Do notwait
until the very lastmomentto submit thecoursework.

vi.Your submission should be a single PDFfile.

vii.For more detailson submission regulations,please refer to your relevant handbook.
2.Coursework overview

The courseworkis based on theBayesian materialand must becompleted individually
(group submissions will not be accepted).

To complete thecoursework,follow thetasksbelow andanswer ALL questions
enumeratedin Section 3.It is recommended that you readthe full documentbeforeyou
start completing the tasks enumerated below.

What follows has been tested on Windows and MAC operating systems. There is a
compatibility issue with MAC OS (and likely to extend to Linux) which is covered in
the Bayesys manual (details below), but which does not influence the coursework
submission requirements.

Task1: Set up and reading

b)Download the Bayesys user manual.

c)Set up the project by following the steps in Section 1 of themanual.

d)Read Section 2 of the manual.

e)Read Section 3.

f)Read Section 4.

g)SkipSection 5.

h)Read Section 6 and repeat the example.

i.MAC and Linux users will not be able to view the PDF graphs shown in Fig
6.1; i.e., the compatibility issue involves the PDF file generator.

ii.Skip subsections 6.3, 6.3.1, and 6.4.

i)SkipSection 7.

j)SkipSection 8.

k)Read Section 9.

l)Skipthe appendices.

Task2: Determine research area and collate data

You are free to choose or collate your own dataset.You should also determine the dataset
size, both in terms of the number of variables and the sample size, relevant to the problem
you are analysing. Some areas might require more data than others,and it is up to you to
make this decision.
You should address a data-related problem in your professional field or a field you are
interested in. If you are motivated in the subject matter the project will be more fun for you
and you will likely produce a better report.Section5provides a list of data sources you could

You are allowed to reuse the dataset you prepared during the Python coursework, as long as
a) your Python coursework submission wasNOTa group submission, and b)youconsider
the dataset to be suitable for Bayesian network structure learning (refer to Q1 in Section 3).

Lastly,you are not allowed to reuse datasets from the Bayesysrepositoryfor this

Task3: Prepare your dataset for structure learning

a)The Bayesys structure learning system assumes the input data are discrete; e.g.,
low/medium/high or Yellow/Blue/Green, rather than a continuous range of numbers.
If you have a continuous variable in your dataset withintegersranging, for example,
from 1 to 100, the algorithm will assume that this variable has 100 different states
(and many more if the values are not integer). This will make the dimensionality of
the model unmanageable, leading to poor accuracy and high runtime;if this is not
clear why, refer to the Conditional Probability Tables (CPTs) in the lecture slides and
relevant bookmaterial.

You should discretise continuous variables to reduce the numberof statesto
reasonable levels. For example, you could discretise the variable discussed above,
with values ranging from 1 to 100,into the five states {1to20,21to40,41to60,
61to80,81to100}. If a continuous variable incorporates a small number of
different values (e.g., less than 10), it may not need discretisation.

It is up to you to determinewhether a variable requires discretisation, as well
asthe level of discretisation.You are free to follow any approach you wish to
discretise the variable,includingdiscretising the variablesmanually as discussed in
the above example. The structure learningaccuracy is not expected to be strongly
influenced as long as thedimensionalityof the data isreasonablewith respect to its
sample size.

b)Your dataset must not have missing values (i.e., empty cells). Replace ALL empty
cells with the value ‘missing’ (or use a different relevant name). This forces the
algorithm to consider all missing values as an additional state. If missing data follows
a pattern, this may or may not help the algorithm to produce a more accurate graph.

c)Rename your dataset totrainingData.csvand place it in folderInput.
Task3:Draw outyour knowledge-based graph

a)Use your knowledge to produce a knowledge causalgraph given the variables in your
dataset. You may find it easier if you start drawing the graph by hand.

b)Record this knowledge in a CSV file following the format ofDAGtrue.csvas
depictedin theBayesysmanual.For an example file, refer tofileDAGtrue_ASIA.csv
inproject directorySample input files/Structure learning.

c)Rename your knowledge graphfileDAGtrue.csvand place it in folderInput.

d)Make another copy of the above file, rename itDAGlearned.csvandplace it in folder

e)Run the Bayesys NetBeans project and make sure your dataset is in folderInputand
namedtrainingData.csv(as per Task 2c). Under tabMain, selectEvaluate graphand
the subprocessGenerate DAGlearned.PDF. Then hitRun.

i.The system will generate the fileDAGlearned.pdfin folderOutput. This is
your knowledge graph drawn by the system.

If you are working on MAC/Linux OS, theDAGlearned.pdffile is
likely to be corrupted. If it is, you can use an online Graphviz editor such as
the one available here: Graphviz editor turns a textual
representation of a graph into a visual drawing.Use the code shown below,
as an example, and edit the code accordingly to be consistent with your
DAGtrue.csv; e.g., the relationships can be taken directly from the CSV file.
The graph should update instantly as you edit the code.






ii.This step also generates some information in the terminal window of
NetBeans. Save thelast three linesas you will need them in answering some
of the questions in Section 3; i.e., theline outputs involvingLog-Likelihood
(LL) score, BIC score and the # of free parameters.
You’re reading a preview
Preview Documents

To View Complete Document

Click the button to download
Subscribe to our plans

Download This Document