Enterprise Business Intelligence: Data Mining and Machine Learning for Business Intelligence

Verified

Added on 2023/06/05

AI Summary

Desklib's Enterprise Business Intelligence project explores the applications of data mining and machine learning for business intelligence. The project analyzes the health news dataset using Weka data mining applications and is divided into 10 practicals covering topics such as clustering algorithms, supervised mining, performance evaluation, time series prediction, text mining, and image analytics. The project provides a step-by-step guide to installing Weka software, downloading the data repository, and data pre-processing. The project also includes data visualization and dimension reduction techniques.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Enterprise Business Intelligence
Abstract
This project is designed to provide the good opportunity to use the data mining and
machine learning method in discovering knowledge from a dataset and explore the

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

applications for business intelligence. This project analysis the health news dataset to explore
the weka data mining applications. This project divided in to 10 practical. In First practical,
we are going to install the weka software and download the Data repository. In Second
practical, this practical is used to do data and data pre-processing for provided data set. In
third practical, this practical is used for data visualization and dimension reduction. In fourth
practical, this practical is used to do the clustering algorithm like K-Means. In Fifth practical,
this task is used to do supervised mining that is classification algorithm on weka. In Sixth
practical, this practical is used to do performance evaluation on weka experimenter and
Knowledge Flow. In seventh practical, this practical is used for predicting the time series on
weka package manager. In eighth practical, this task is used to do text mining. In final, this
practical is used to do the image analytics on weka. These are will be analysed and
demonstrated in detail.
Table of Contents
1 Introduction......................................................................................................................3
2 Data set..............................................................................................................................3
1

3 Data mining Techniques..................................................................................................4
4 Evaluation and Demonstration.......................................................................................4
4.1 Practical – 1................................................................................................................5
4.2 Practical – 2................................................................................................................6
4.3 Practical – 3................................................................................................................9
4.3.1 Visualising the Dataset.......................................................................................9
4.3.2 Visualising the Dataset using Classifiers........................................................15
4.4 Practical – 4..............................................................................................................21
4.4.1 Manually Working with K-Means..................................................................21
4.4.2 Unsupervised Learning in WEKA – Clustering............................................24
4.5 Practical – 5..............................................................................................................26
4.6 Practical – 6..............................................................................................................33
4.6.1 Weka Experimenter.........................................................................................33
4.6.2 Weka Knowledge Flow....................................................................................37
4.7 Practical – 7..............................................................................................................43
4.8 Practical – 8..............................................................................................................56
4.8.1 Training the Classifier Model.........................................................................56
4.8.2 Predict the Class in Test..................................................................................58
4.9 Practical – 9..............................................................................................................61
5 Conclusion.......................................................................................................................65
References...............................................................................................................................66
1 Introduction
This project is designed to provide the good opportunity to use the data mining and
machine learning method in discovering knowledge from a dataset and explore the
2

applications for business intelligence. This project analysis the health news dataset to explore
the weka data mining applications. This project divided in to 10 practical. In First practical,
we are going to install the weka software and download the Data repository. In Second
practical, this practical is used to do data and data pre-processing for provided data set. In
third practical, this practical is used for data visualization and dimension reduction. In fourth
practical, this practical is used to do the clustering algorithm like K-Means. This practical
divided into two parts such as Part 1 and part 2. The part 1 is manually calculated the K-
means for provided data set. The part 2 is to use weka clustering algorithm to calculate the K-
Means. In Fifth practical, this task is used to do supervised mining that is classification
algorithm on weka. In Sixth practical, this practical is used to do performance evaluation on
weka experimenter and Knowledge Flow. In seventh practical, this practical is used for
predicting the time series on weka package manager. In eighth practical, this task is used to
do text mining. In final, this practical is used to do the image analytics on weka. These are
will be analysed and demonstrated in detail.
2 Data set
Each record is identified with one Twitter record of a news office. For instance,
bbchealth.txt is identified with BBC wellbeing news. Each line contains tweet id | date and
time | tweet. The separator is '|'. This content information has been utilized to assess the
execution of point models on short content information. Be that as it may, it very well may be
utilized for different assignments, for example, clustering.
3 Data mining Techniques
Data mining techniques is illustrated as below.
3

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

4 Evaluation and Demonstration
4.1 Practical – 1
This task, we are going to install the weka software and download the Data repository.
First, user needs to download and install the weka software.
4

Once weka installed successfully, user requires the open the weka software. It is illustrated as
below.
After, download the UCI data repository by using the provided link and it is illustrated as
below.
https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter
After, click the data folder
Then, click the Health_News_Tweets.zip file to download the data set. The download data set
is attached below.
5

4.2 Practical – 2
This practical is used to do data and data pre-processing for provided data set. To do the
data pre-processing on Weka by follows the below steps.
First, user needs to open the weka.
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

After, click the Explorer to load the data set.
Once data is load successfully, the data pre-processing process is completed. It is shown
below.
7

If click the attributes, weka shows the selected attributes virtualization. It is shown below.
8

4.3 Practical – 3
4.3.1 Visualising the Dataset
This practical is used for data visualization and dimension reduction. To do the data
visualization on weka. On explorer windows, click visualize to view the data visualization. It
is shown below (Onlinecourses.science.psu.edu, 2018).
9

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The below data visualization shows the details for Response ID and receipt name
information from health data set.
10

The below data visualization shows the details for recorded date and location latitude
information from health data set.
11

The below data visualization shows the details for end date and recipient first name
information from health data set (Statistics Solutions, 2018).
12

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The below data visualization shows the details for IP address and recipient first name
information from health data set.
13

The below data visualization shows the details for IP address and recipient last name
information from health data set.
14

4.3.2 Visualising the Dataset using Classifiers
Using the Stacking classifiers to visualizing the data set. It is shown below.
15

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

=== Classifier model (full training set) ===
Stacking
Base classifiers
ZeroR predicts class value: 61.68.172.57
Meta classifier
ZeroR predicts class value: 61.68.172.57
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 1 1.2346 %
Incorrectly Classified Instances 80 98.7654 %
Kappa statistic -0.035
Mean absolute error 0.0324
Root mean squared error 0.1276
Relative absolute error 100 %
Root relative squared error 100 %
16

Total Number of Instances 81
After, choose the selected attributes to classifier subset eval. It is shown below (Evgeniou,
2018).
17

Results is shown below.
=== Attribute Selection on all input data ===
Search Method:
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 76
Merit of best subset found: 0
18

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Attribute Subset Evaluator (supervised, Class (numeric): 3 Status):
Classifier Subset Evaluator
Learning scheme: weka.classifiers.rules.ZeroR
Scheme options:
Hold out/test set: Training data
Subset evaluation: RMSE
Data set visualization is shown below.
The, choosing the search method is Ranker,
19

=== Attribute selection 10 fold cross-validation seed: 1 ===
average merit average rank attribute
0 +- 0 1 +- 0 17 UserLanguage
0 +- 0 2 +- 0 16 DistributionChannel
0 +- 0 3 +- 0 7 Finished
0 +- 0 4 +- 0 6 Duration (in seconds)
0 +- 0 5 +- 0 4 IPAddress
0 +- 0 6 +- 0 3 Status
0 +- 0 7 +- 0 2 EndDate
0 +- 0 8 +- 0 8 RecordedDate
0 +- 0 9 +- 0 9 ResponseId
0 +- 0 10 +- 0 10 RecipientLastName
0 +- 0 11 +- 0 14 LocationLatitude
0 +- 0 12 +- 0 15 LocationLongitude
0 +- 0 13 +- 0 13 ExternalReference
0 +- 0 14 +- 0 11 RecipientFirstName
0 +- 0 15 +- 0 12 RecipientEmail
0 +- 0 16 +- 0 1 StartDate
20

4.4 Practical – 4
This practical is used to do the clustering algorithm like K-Means. This practical
divided into two parts such as Part 1 and part 2. The part 1 is manually calculated the K-
means for provided data set. The part 2 is to use weka clustering algorithm to calculate the K-
Means. These are will be discussed and analysed in detail (Gonçalves, 2018).
4.4.1 Manually Working with K-Means
Cluster
Cluster is the task of partitioning the populace or information focuses into various
gatherings to such an extent that information focuses in similar gatherings are more like other
information focuses in a similar gathering than those in different gatherings. In basic words,
the point is to isolate clusters with comparable qualities and appoint them into groups.
Cluster Analysis
Cluster analysis is a class of procedures that are utilized to characterize protests or
cases into relative gatherings called groups. Cluster examination is additionally called
arrangement investigation or numerical scientific classification. In cluster examination, there
is no earlier data about the gathering or group participation for any of the articles.
Cluster Analysis has been utilized in promoting for different purposes. Division of
customers in cluster examination is utilized based on benefits looked for from the buy of the
item. It very well may be utilized to distinguish homogeneous gatherings of purchasers.
Cluster investigation includes planning an issue, choosing a separation measure, choosing a
grouping technique, choosing the quantity of clusters, translating the profile clusters lastly,
surveying the legitimacy of grouping.
The factors on which the cluster investigation is to be done ought to be chosen by
remembering past research. It ought to likewise be chosen by hypothesis, the speculations
being tried, and the judgment of the specialist. A proper proportion of separation or closeness
ought to be chosen; the most usually utilized measure is the Euclidean separation or its
square.
Clustering systems in group examination might be progressive, non-various levelled,
or a two-advance methodology. A various levelled technique in group examination is
portrayed by the improvement of a tree like structure. A progressive method can be
agglomerative or disruptive. Agglomerative techniques in cluster investigation comprise of
21

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

linkage strategies, fluctuation strategies, and centroid strategies. Linkage strategies in group
investigation are involved single linkage, finish linkage, and normal linkage.
The non-progressive strategies in group investigation are much of the time alluded to
as K implies clustering. The two-advance method can naturally decide the ideal number of
groups by looking at the estimations of model decision criteria crosswise over various
clustering arrangements. The decision of grouping methodology and the decision of
separation measure are interrelated. The relative sizes of clusters in group investigation ought
to be significant. The clusters ought to be deciphered as far as group centroids (Kaushik,
2018).
Measure the similarity between two objects
Similarity and dissimilarity are critical on the grounds that they are utilized by various
information mining strategies, for example, grouping closest neighbour arrangement and
abnormality location. The term vicinity is utilized to allude to either Similarity or
dissimilarity.
The Similarity between two objects is a numeral proportion of how much the two
articles are similar. Thus, Similarities are higher for sets of articles that are all the more
similar. Similitudes are normally non-negative and are frequently between 0 (no closeness)
and 1(complete comparability).
The dissimilarity between two objects is the numerical proportion of how much the
two articles are unique. Difference is bring down for more comparative sets of objects.
Regularly, the term separate is utilized as an equivalent word for dissimilarity. Dissimilarities
in some cases fall in the interim [0, 1], yet it is additionally normal for them to run.
Types of Attributes
 Nominal
o The estimations of an ostensible property are simply extraordinary names, i.e.
ostensible traits give sufficiently just data to recognize one protest from
another(=,≠)
o Examples: postal districts, workers ID numbers.
 Ordinal
o The estimations of an ordinal credit give enough data to arrange objects(<, >)
o Examples: Hardness of minerals, road numbers
 Interval
22

o For interim traits, the contrasts between values are meaningful,i.e. a unit of
estimation exists(+,- )
o Examples: Calendar dates, Temperature in Celsius or Fahrenheit.
 Ratio
o For apportion factors, the two contrasts and proportions are meaningful(*,/)
o Examples: Temperature in Kelvin, checks, age.
 Paired Attributes
o Binary information has just 2 values/states. For Example yes or no, influenced
or unaffected, genuine or false.
 Symmetric: Both qualities are similarly critical (Gender).
 Asymmetric: Both qualities are not similarly critical (Result).
 Discrete
o Discrete information have limited qualities it tends to be numerical and can
likewise be in straight out frame. These qualities has limited or countable
boundless arrangement of qualities.
 Continuous
o Continuous information have interminable no of states. Nonstop information is
of buoy compose. There can be numerous qualities somewhere in the range of
2 and 3.
Best clustering
K-Means clustering is good clustering technique compared to other cluster because K-
Means is presumably the most surely understand grouping calculation. It's instructed in a
considerable measure of basic information science and machine learning classes. It's
straightforward and actualize in code (Mnemstudio.org, 2018).
Problems
23

Output
Individual Mean Vector (centroid)
Cluster 1 10, 11, 12, 13 (18, 42)
Cluster 2 14, 15,16,17, 18, 19,20 (20, 40)
4.4.2 Unsupervised Learning in WEKA – Clustering
Using Weka tool to do unsupervised learning by click the cluster to choose the filtered
cluster. It is display the following output.
24

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Final cluster centroids:
Cluster#
Attribute Full Data 0 1
(81.0) (32.0) (49.0)
==================================================================
================================
StartDate 08-11-2017 06:19 08-11-2017 06:19 19-11-2017 22:38
EndDate 08-11-2017 06:20 08-11-2017 06:20 06-11-2017 00:04
Status 0 0 0
IPAddress 61.68.172.57 58.7.190.136 120.149.96.61
Progress 96.0247 98.25 94.5714
Duration (in seconds) 10242.0617 24155.1563 1155.9592
Finished 0.9383 0.9375 0.9388
RecordedDate 08-11-2017 06:20 08-11-2017 06:20 06-11-2017 00:04
ResponseId R_3iVAurplNL0h364 R_3iVAurplNL0h364
R_3G88YIjYqcvtOqq
RecipientLastName Kaur Kaur Kaur
RecipientFirstName Soham Soham Soham
RecipientEmail s.yashwee@hotmail.com s.yashwee@hotmail.com
s.yashwee@hotmail.com
25

LocationLatitude -32.2704 -32.2556 -32.2801
LocationLongitude 119.919 119.2455 120.3588
DistributionChannel anonymous email anonymous
UserLanguage EN EN EN
Time taken to build model (full training data) : 0.01 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 32 ( 40%)
1 49 ( 60%)
Output visualization is shown below (Towards Data Science, 2018).
26

4.5 Practical – 5
This task is used to do supervised mining that is classification algorithm on weka. To do
classification by click the weka explorer and choose classify and it is shown below.
27

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Here, we are choose the naïve Bayes and it is shown below (GeeksforGeeks, 2018).
=== Classifier model (full training set) ===
Dictionary size: 0
The independent frequency of a class
--------------------------------------
email 34.0
28

anonymous 49.0
The frequency of a word given the class
-----------------------------------------
email anonymous
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4833
Root mean squared error 0.4914
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization is shown below.
29

After, choose ZeroR classifier to do the classification. It is shown below.
30

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

=== Classifier model (full training set) ===
ZeroR predicts class value: anonymous
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4833
Root mean squared error 0.4914
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization for Margin Curve is shown below.
31

Visualization of threshold curve is shown below
Visualization of cost and benefit analysis shown below.
33

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

4.6 Practical – 6
This practical is used to do performance evaluation on weka experimenter and Knowledge
Flow. This process are shown below.
4.6.1 Weka Experimenter
To do weka experimenter by follows the below steps. First, open the weka and choose the
experimenter. It is shown below.
After, click new to load the data set.
34

After, run the data. It is shown below.
Then, analyse the data set. It is shown below. To analyse the data by click the perform test.
35

The test result is shown below.
Tester: weka.experiment.PairedCorrectedTTester -R 0 -S 0.05 -result-matrix
"weka.experiment.ResultMatrixPlainText -mean-prec 2 -stddev-prec 2 -col-name-width 0 -
row-name-width 25 -mean-width 0 -stddev-width 0 -sig-width 0 -count-width 5 -print-col-
names -print-row-names -enum-col-names"
Analysing: region-centroid-col
Datasets: 1
Resultsets: 1
Confidence: 0.05 (two tailed)
Sorted by: -
Date: 24/09/18, 5:36 PM
Dataset (1)
-----------------------------------------
(1500 125.20 |
-----------------------------------------
(v/ /*) |
Key:
(1)
36

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

After, click the cols to analyse the data. It is shown below.
Results is shown below.
37

4.6.2 Weka Knowledge Flow
To do weka knowledge flow by using the below steps.
First open the weka and choose the knowledge flow. It is shown below.
Click the data sources and choose the ARFF loader.
And, click the evaluation to select the Class assigner, class value picker and cross validation
fold maker. It is shown below.
After, click the classifiers to choose naïve Bayes and random forest. It is shown below.
38

Also click the classifier evaluation to choose the classifier performance evaluators used for
two classifiers. It is shown below.
Then, connect the data set for all the data sources. It is shown below.
39

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

After, configure the arff data set. It is shown below.
40

Configure the class assigner. It is shown below.
Configure the class value picker. It is shown below.
41

Configure class validation fold maker. It is shown below.
Finally, load the data to click the start button. The result is shown below.
42

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Output is shown below.
43

4.7 Practical – 7
This practical is used for predicting the time series on weka package manager. To do
predicting time series on package manager by follows the below steps.
Open weka.
Click tools to choose the package manager. It is shown below.
Here, choose the time series forecasting. This process is shown below.
44

After, click install to install the time series forecasting.
45

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Installing process is shown below.
After, open the weka explorer and click the forecast. Then, choose the attributes to predicting
the time series. It is shown below.
46

Transformed training data:
Status
ArtificialTimeIndex
Lag_Status-1
Lag_Status-2
Lag_Status-3
Lag_Status-4
Lag_Status-5
Lag_Status-6
Lag_Status-7
Lag_Status-8
Lag_Status-9
Lag_Status-10
Lag_Status-11
Lag_Status-12
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_Status-1
ArtificialTimeIndex*Lag_Status-2
ArtificialTimeIndex*Lag_Status-3
ArtificialTimeIndex*Lag_Status-4
47

ArtificialTimeIndex*Lag_Status-5
ArtificialTimeIndex*Lag_Status-6
ArtificialTimeIndex*Lag_Status-7
ArtificialTimeIndex*Lag_Status-8
ArtificialTimeIndex*Lag_Status-9
ArtificialTimeIndex*Lag_Status-10
ArtificialTimeIndex*Lag_Status-11
ArtificialTimeIndex*Lag_Status-12
Status:
Linear Regression Model
Status =
0 * ArtificialTimeIndex +
0 * ArtificialTimeIndex^2 +
0 * ArtificialTimeIndex^3 +
0
=== Predictions for test data: Status (1-step ahead) ===
inst# actual predicted error
58 0 0 0
59 0 0 0
60 0 0 0
61 0 0 0
62 0 0 0
63 0 0 0
64 0 0 0
65 0 0 0
66 0 0 0
67 0 0 0
68 0 0 0
69 0 0 0
70 0 0 0
71 0 0 0
72 0 0 0
48

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

73 0 0 0
74 0 0 0
75 0 0 0
76 0 0 0
77 0 0 0
78 0 0 0
79 0 0 0
80 0 0 0
81 0 0 0
=== Future predictions from end of training data ===
inst# Status
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
49

23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 0
41 0
42 0
43 0
44 0
45 0
46 0
47 0
48 0
49 0
50 0
51 0
52 0
53 0
54 0
55 0
56 0
50

57 0
58* 0
=== Future predictions from end of test data ===
inst# Status
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25* 0
=== Evaluation on test data ===
Target 1-step-ahead
========================================
51

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Status
N 24
Mean absolute error 0
Root mean squared error 0
Mean squared error 0
Total number of instances: 24
Future prediction is shown below.
After, choose the classifier as random forest. It is shown below.
52

Results is shown below.
=== Predictions for test data: Status (1-step ahead) ===
inst# actual predicted error
58 0 0 0
59 0 0 0
60 0 0 0
61 0 0 0
62 0 0 0
63 0 0 0
64 0 0 0
65 0 0 0
66 0 0 0
67 0 0 0
68 0 0 0
69 0 0 0
70 0 0 0
71 0 0 0
72 0 0 0
73 0 0 0
53

74 0 0 0
75 0 0 0
76 0 0 0
77 0 0 0
78 0 0 0
79 0 0 0
80 0 0 0
81 0 0 0
=== Future predictions from end of training data ===
inst# Status
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
54

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 0
41 0
42 0
43 0
44 0
45 0
46 0
47 0
48 0
49 0
50 0
51 0
52 0
53 0
54 0
55 0
56 0
57 0
55

58* 0
=== Future predictions from end of test data ===
inst# Status
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25* 0
=== Evaluation on test data ===
Target 1-step-ahead
========================================
Status
56

N 24
Mean absolute error 0
Root mean squared error 0
Mean squared error 0
Total number of instances: 24
Future prediction is shown below (Researchoptimus.com, 2018).
4.8 Practical – 8
This task is used to do text mining by using the below steps.
4.8.1 Training the Classifier Model
First, choose the unsupervised attributes as string to word vector. It is shown below.
57

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Results is shown below.
After, click classifier to choose the stack classifier. It is shown below.
58

Stacking
Base classifiers
ZeroR predicts class value: 96.0246913580247
Meta classifier
ZeroR predicts class value: 96.0246913580247
Time taken to build model: 0 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient -0.2912
Mean absolute error 7.5502
Root mean squared error 17.7457
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
4.8.2 Predict the Class in Test
Again, click classifier to choose the filtered classifier. This process output is shown below.
59

Output is shown below.
Classifier Model
J48 pruned tree
------------------
: anonymous (81.0/33.0)
Number of Leaves : 1
Size of the tree : 1
60

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Time taken to build model: 0.06 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4829
Root mean squared error 0.4914
Relative absolute error 99.9145 %
Root relative squared error 99.999 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization is shown below.
61

4.9 Practical – 9
This practical is used to do the image analytics on weka. To do image analytics on the weka
by using the below steps.
Open weka.
Click tools to choose the package manager. It is shown below.
62

Choose image filters and click the install to install the image filters packages. It is shown
below.
After, open weka explorer and click the attributes to choose the unsupervised to select the
image filters. It is shown below.
63

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

After, choose randomize image filters. It is shown below.
64

After, click classify to choose the stack classifier. It is shown below.
5 Conclusion
This project successfully analysed the health news dataset to explore the weka data
mining applications. This project divided in to 10 practical. In First practical, we are
successfully installed the weka software and downloaded the Data repository. In Second
practical, this practical is successfully completed the data and data pre-processing for
provided data set. In third practical, this practical is used for data visualization and dimension
reduction. In fourth practical, this practical is successfully completed clustering algorithm
like K-Means. In Fifth practical, this task is successfully completed supervised mining that is
classification algorithm on weka. In Sixth practical, this practical is successfully completed
performance evaluation on weka experimenter and Knowledge Flow. In seventh practical,
this practical is successfully completed predicting the time series on weka package manager.
In eighth practical, this task is successfully completed text mining. In final, this practical is
66

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

successfully completed image analytics on weka. These are analysed and demonstrated in
detail.
References
Evgeniou, T. (2018). Cluster Analysis and Segmentation. [online]
Inseaddataanalytics.github.io. Available at:
http://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions45/
ClusterAnalysisReading.html [Accessed 24 Sep. 2018].
GeeksforGeeks. (2018). Understanding Data Attribute Types | Qualitative and Quantitative -
GeeksforGeeks. [online] Available at: https://www.geeksforgeeks.org/understanding-data-
attribute-types-qualitative-and-quantitative/ [Accessed 24 Sep. 2018].
Gonçalves, H. (2018). K-means clustering - algorithm and examples. [online] Onmyphd.com.
Available at: http://www.onmyphd.com/?p=k-means.clustering [Accessed 24 Sep. 2018].
Kaushik, S. (2018). An Introduction to Clustering & different methods of clustering. [online]
Analytics Vidhya. Available at: https://www.analyticsvidhya.com/blog/2016/11/an-
introduction-to-clustering-and-different-methods-of-clustering/ [Accessed 24 Sep. 2018].
67

Mnemstudio.org. (2018). Step-By-Step K-Means Example. [online] Available at:
http://mnemstudio.org/clustering-k-means-example-1.htm [Accessed 24 Sep. 2018].
Onlinecourses.science.psu.edu. (2018). 1(b).2.1: Measures of Similarity and Dissimilarity |
STAT 897D. [online] Available at: https://onlinecourses.science.psu.edu/stat857/node/3/
[Accessed 24 Sep. 2018].
Researchoptimus.com. (2018). What is Cluster Analysis?. [online] Available at:
https://www.researchoptimus.com/article/cluster-analysis.php [Accessed 24 Sep. 2018].
Statistics Solutions. (2018). Cluster Analysis - Statistics Solutions. [online] Available at:
http://www.statisticssolutions.com/directory-of-statistical-analyses-cluster-analysis/
[Accessed 24 Sep. 2018].
Towards Data Science. (2018). The 5 Clustering Algorithms Data Scientists Need to Know.
[online] Available at: https://towardsdatascience.com/the-5-clustering-algorithms-data-
scientists-need-to-know-a36d136ef68 [Accessed 24 Sep. 2018].
68

1 out of 69

Enterprise Business Intelligence: Data Mining and Machine Learning for Business Intelligence

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Related Documents

Utilizing Data Mining and Machine Learning for Business Intelligence

Enterprise Business Intelligence: Data Mining and Machine Learning Techniques

Implementing Machine Learning and Data Mining Techniques for Business Intelligence

Data Mining and Visualization for Business Intelligence

Data Handling and Business Intelligence

Data Handling and Business Intelligence

+13062052269

info@desklib.com