SIT717 Assignment 2: Enterprise BI Technical Report on Data Mining

Verified

Added on 2023/06/04

AI Summary

This technical report presents a comprehensive analysis of data mining and machine learning techniques applied to Enterprise Business Intelligence. The assignment utilizes the Weka data mining software to explore a health news dataset, encompassing ten practical exercises. These exercises cover data loading, preprocessing, visualization, dimensionality reduction, and clustering using K-Means. The report details the application of Weka's clustering algorithms, execution evaluation using the Weka Experimenter and Knowledge Flow, and time series forecasting. Furthermore, it includes content and image mining applications within Weka. The report provides detailed explanations of each practical, including screenshots and results, offering a practical guide to data mining and machine learning in a business context. The project also includes the manual calculation of K-Means and a comparison with the Weka implementation.

Enterprise Business Intelligence
Abstract
This project is planned to give the huge opportunity to use the data mining and
machine learning in methodology in disclosure data from a dataset and examine the

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

applications for business learning. This assignment examination the wellbeing news dataset
to explore the weka data mining applications. This project has10 sensible practical. In First,
we will present the weka programming and download the Data set. In second, this sensible is
used to do data and data pre-processing for presented dataset. In third, this viable is used for
data perception and measurement decrease. In fourth, this valuable is used to do the
clustering calculation like K-Means. In Fifth, this errand is used to data mining that is
clustering algorithm on weka. In 6th practical, this helpful is used to do execution evaluation
on weka experimenter and Knowledge Flow. In seventh, this helpful is used for envisioning
the time arrangement on weka package administrator. In eighth, this task is used to do content
mining. In definite, this assignment is used to do the image investigation on weka. These are
will be broke down and examined in detail.
Table of Contents
1

1 Introduction......................................................................................................................3
2 Data set..............................................................................................................................3
3 Data mining Techniques..................................................................................................3
4 Evaluation and Demonstration.......................................................................................4
4.1 Practical – 1................................................................................................................4
4.2 Practical – 2................................................................................................................6
4.3 Practical – 3................................................................................................................9
4.3.1 Visualising the Dataset.......................................................................................9
4.3.2 Visualising the Dataset using Classifiers........................................................12
4.4 Practical – 4..............................................................................................................18
4.4.1 Manually Working with K-Means..................................................................18
4.4.2 Unsupervised Learning in WEKA – Clustering............................................20
4.5 Practical – 5..............................................................................................................22
4.6 Practical – 6..............................................................................................................29
4.6.1 Weka Experimenter.........................................................................................29
4.6.2 Weka Knowledge Flow....................................................................................33
4.7 Practical – 7..............................................................................................................39
4.8 Practical – 8..............................................................................................................46
4.8.1 Training the Classifier Model.........................................................................46
4.8.2 Predict the Class in Test..................................................................................48
4.9 Practical – 9..............................................................................................................50
5 Conclusion.......................................................................................................................55
References...............................................................................................................................56
1 Introduction
2

This task is planned to give the huge opportunity to use the data mining and machine
learning in technique in disclosure learning from a dataset and explore the applications for
business learning. This project investigation the wellbeing news dataset to explore the weka
data mining applications. This undertaking assigned in to 10 practical. In First, we will
present the weka programming and download the Data file. In second, this sensible is used to
do data and data pre-planning for gave dataset. In third, this down to earth is used for data
perception and measurement decrease. In fourth, this valuable is used to do the clustering
calculation like K-Means. This undertaking isolated into two sections, for example, Part 1
and section 2. The section 1 is physically figured the K-implies for gave data directory. The
section 2 is to utilize weka clustering calculation to compute the K-Means. In Fifth, this
assignment is used to administer data mining that is classification algorithm on weka. In 6th
rational, this valuable is used to do execution appraisal on weka experimenter and Knowledge
Flow. In seventh, this valuable is used for foreseeing the time arrangement on weka package
administrator. In eighth, this task is used to do content mining. In conclusive, this assignment
is used to do the image analysis on weka. These are will be broke down and examined in
detail.
2 Data set
Each record is identified with one Twitter record of a news office. For instance, bbchealth.txt
is identified with BBC thriving news. Each line contains tweet id | date and time | tweet. The
separator is '|'. This substance data has been utilized to assess the execution of point models
on short substance data. Regardless, it may be utilized for different assignments, for example,
clustering.
3 Data mining Techniques
Data Mining Techniques
There are various methods utilized in data mining, however not every one of them can be
connected to a wide range of data. Neural system calculations, for instance, can be utilized to
measure data (numerical data), however they can't qualify data correctly (unmitigated data);
in this manner, clear cut data is generally separated into numerous dichotomous factors, every
one of them with estimations of 1 ("yes") or 0 ("no"). A portion of the conventional
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

measurable strategies that can be utilized for data mining are the accompanying (Sherif,
2016),
• Cluster investigation, additionally called division.
• Discriminant investigation.
• Logistic regression.
• Time series forecasting
Cluster examination (or division) is a standout amongst the most much of the time
utilized data mining procedures; it includes isolating arrangements of data into clusters that
incorporate a progression of predictable examples. Discriminant investigation is one of the
most established clustering strategies. It finds hyper planes that different classes with the goal
that clients would then be able to apply them to decide the side of the hyper plane in which to
list the data. Discriminant investigation has impediments, be that as it may.
Linear Regression is a speculation of straight regression. It is essentially utilized for
anticipating twofold factors and, less regularly, multi-class factors. Models of calculated
regression anticipate the logarithm of the chances of the events of discrete factors. The
fundamental supposition of the strategic regression show is that the logarithm of the chances
is straight in the coefficients of the indicator factors
Data Visualization
Data perception is likewise helpful for data mining. Through utilizing visual
instruments, experts can achieve a superior comprehension of the data since they can
concentrate on a portion of the examples found by other technique. Utilizing varieties of
Color, measurements, and profundity, it is conceivable to discover new affiliations and
enhance the separation between them.
4

4 Evaluation and Demonstration
4.1 Practical – 1
This task, we will present the weka programming and download the Data set. In the
first place, customer needs to download and present the weka programming. At the point
when weka presented successfully, customer requires the open the weka programming. It is
spoken to as underneath.
5

Go to below link to download the data set.
https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter
After, click the data folder
Then, click the Health_News_Tweets.zip file to download the data set. The download data set
is attached below.
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4.2 Practical – 2
This undertaking is used to do data and data pre-getting ready for given data
accumulation. To do the data pre-preparing on Weka by makes after the underneath strides.
To begin with, customer needs to open the weka.
To load the data set by click the explorer to select the open file.
7

If click the attributes, weka demonstrations the selected attributes virtualization.
9

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4.3 Practical – 3
4.3.1 Visualising the Dataset
This task is utilized for data representation and measurement decrease. To do the data
perception on weka. On explorer windows, click envision to see the data representation. It is
represented as underneath.
10