Enterprise Business Intelligence: Data Mining and Machine Learning Techniques
VerifiedAdded on  2023/06/04
|58
|4314
|252
AI Summary
This project explores the applications of data mining and machine learning in business learning with the Weka software. It examines the health news dataset and covers practicals on data pre-processing, clustering algorithms, classification algorithms, and more.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Enterprise Business Intelligence
Abstract
This project is planned to give the huge opportunity to use the data mining and
machine learning in methodology in disclosure data from a dataset and examine the
Abstract
This project is planned to give the huge opportunity to use the data mining and
machine learning in methodology in disclosure data from a dataset and examine the
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
applications for business learning. This assignment examination the wellbeing news dataset
to explore the weka data mining applications. This project has10 sensible practical. In First,
we will present the weka programming and download the Data set. In second, this sensible is
used to do data and data pre-processing for presented dataset. In third, this viable is used for
data perception and measurement decrease. In fourth, this valuable is used to do the
clustering calculation like K-Means. In Fifth, this errand is used to data mining that is
clustering algorithm on weka. In 6th practical, this helpful is used to do execution evaluation
on weka experimenter and Knowledge Flow. In seventh, this helpful is used for envisioning
the time arrangement on weka package administrator. In eighth, this task is used to do content
mining. In definite, this assignment is used to do the image investigation on weka. These are
will be broke down and examined in detail.
Table of Contents
1
to explore the weka data mining applications. This project has10 sensible practical. In First,
we will present the weka programming and download the Data set. In second, this sensible is
used to do data and data pre-processing for presented dataset. In third, this viable is used for
data perception and measurement decrease. In fourth, this valuable is used to do the
clustering calculation like K-Means. In Fifth, this errand is used to data mining that is
clustering algorithm on weka. In 6th practical, this helpful is used to do execution evaluation
on weka experimenter and Knowledge Flow. In seventh, this helpful is used for envisioning
the time arrangement on weka package administrator. In eighth, this task is used to do content
mining. In definite, this assignment is used to do the image investigation on weka. These are
will be broke down and examined in detail.
Table of Contents
1
1 Introduction......................................................................................................................3
2 Data set..............................................................................................................................3
3 Data mining Techniques..................................................................................................3
4 Evaluation and Demonstration.......................................................................................4
4.1 Practical – 1................................................................................................................4
4.2 Practical – 2................................................................................................................6
4.3 Practical – 3................................................................................................................9
4.3.1 Visualising the Dataset.......................................................................................9
4.3.2 Visualising the Dataset using Classifiers........................................................12
4.4 Practical – 4..............................................................................................................18
4.4.1 Manually Working with K-Means..................................................................18
4.4.2 Unsupervised Learning in WEKA – Clustering............................................20
4.5 Practical – 5..............................................................................................................22
4.6 Practical – 6..............................................................................................................29
4.6.1 Weka Experimenter.........................................................................................29
4.6.2 Weka Knowledge Flow....................................................................................33
4.7 Practical – 7..............................................................................................................39
4.8 Practical – 8..............................................................................................................46
4.8.1 Training the Classifier Model.........................................................................46
4.8.2 Predict the Class in Test..................................................................................48
4.9 Practical – 9..............................................................................................................50
5 Conclusion.......................................................................................................................55
References...............................................................................................................................56
1 Introduction
2
2 Data set..............................................................................................................................3
3 Data mining Techniques..................................................................................................3
4 Evaluation and Demonstration.......................................................................................4
4.1 Practical – 1................................................................................................................4
4.2 Practical – 2................................................................................................................6
4.3 Practical – 3................................................................................................................9
4.3.1 Visualising the Dataset.......................................................................................9
4.3.2 Visualising the Dataset using Classifiers........................................................12
4.4 Practical – 4..............................................................................................................18
4.4.1 Manually Working with K-Means..................................................................18
4.4.2 Unsupervised Learning in WEKA – Clustering............................................20
4.5 Practical – 5..............................................................................................................22
4.6 Practical – 6..............................................................................................................29
4.6.1 Weka Experimenter.........................................................................................29
4.6.2 Weka Knowledge Flow....................................................................................33
4.7 Practical – 7..............................................................................................................39
4.8 Practical – 8..............................................................................................................46
4.8.1 Training the Classifier Model.........................................................................46
4.8.2 Predict the Class in Test..................................................................................48
4.9 Practical – 9..............................................................................................................50
5 Conclusion.......................................................................................................................55
References...............................................................................................................................56
1 Introduction
2
This task is planned to give the huge opportunity to use the data mining and machine
learning in technique in disclosure learning from a dataset and explore the applications for
business learning. This project investigation the wellbeing news dataset to explore the weka
data mining applications. This undertaking assigned in to 10 practical. In First, we will
present the weka programming and download the Data file. In second, this sensible is used to
do data and data pre-planning for gave dataset. In third, this down to earth is used for data
perception and measurement decrease. In fourth, this valuable is used to do the clustering
calculation like K-Means. This undertaking isolated into two sections, for example, Part 1
and section 2. The section 1 is physically figured the K-implies for gave data directory. The
section 2 is to utilize weka clustering calculation to compute the K-Means. In Fifth, this
assignment is used to administer data mining that is classification algorithm on weka. In 6th
rational, this valuable is used to do execution appraisal on weka experimenter and Knowledge
Flow. In seventh, this valuable is used for foreseeing the time arrangement on weka package
administrator. In eighth, this task is used to do content mining. In conclusive, this assignment
is used to do the image analysis on weka. These are will be broke down and examined in
detail.
2 Data set
Each record is identified with one Twitter record of a news office. For instance, bbchealth.txt
is identified with BBC thriving news. Each line contains tweet id | date and time | tweet. The
separator is '|'. This substance data has been utilized to assess the execution of point models
on short substance data. Regardless, it may be utilized for different assignments, for example,
clustering.
3 Data mining Techniques
Data Mining Techniques
There are various methods utilized in data mining, however not every one of them can be
connected to a wide range of data. Neural system calculations, for instance, can be utilized to
measure data (numerical data), however they can't qualify data correctly (unmitigated data);
in this manner, clear cut data is generally separated into numerous dichotomous factors, every
one of them with estimations of 1 ("yes") or 0 ("no"). A portion of the conventional
3
learning in technique in disclosure learning from a dataset and explore the applications for
business learning. This project investigation the wellbeing news dataset to explore the weka
data mining applications. This undertaking assigned in to 10 practical. In First, we will
present the weka programming and download the Data file. In second, this sensible is used to
do data and data pre-planning for gave dataset. In third, this down to earth is used for data
perception and measurement decrease. In fourth, this valuable is used to do the clustering
calculation like K-Means. This undertaking isolated into two sections, for example, Part 1
and section 2. The section 1 is physically figured the K-implies for gave data directory. The
section 2 is to utilize weka clustering calculation to compute the K-Means. In Fifth, this
assignment is used to administer data mining that is classification algorithm on weka. In 6th
rational, this valuable is used to do execution appraisal on weka experimenter and Knowledge
Flow. In seventh, this valuable is used for foreseeing the time arrangement on weka package
administrator. In eighth, this task is used to do content mining. In conclusive, this assignment
is used to do the image analysis on weka. These are will be broke down and examined in
detail.
2 Data set
Each record is identified with one Twitter record of a news office. For instance, bbchealth.txt
is identified with BBC thriving news. Each line contains tweet id | date and time | tweet. The
separator is '|'. This substance data has been utilized to assess the execution of point models
on short substance data. Regardless, it may be utilized for different assignments, for example,
clustering.
3 Data mining Techniques
Data Mining Techniques
There are various methods utilized in data mining, however not every one of them can be
connected to a wide range of data. Neural system calculations, for instance, can be utilized to
measure data (numerical data), however they can't qualify data correctly (unmitigated data);
in this manner, clear cut data is generally separated into numerous dichotomous factors, every
one of them with estimations of 1 ("yes") or 0 ("no"). A portion of the conventional
3
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
measurable strategies that can be utilized for data mining are the accompanying (Sherif,
2016),
• Cluster investigation, additionally called division.
• Discriminant investigation.
• Logistic regression.
• Time series forecasting
Cluster examination (or division) is a standout amongst the most much of the time
utilized data mining procedures; it includes isolating arrangements of data into clusters that
incorporate a progression of predictable examples. Discriminant investigation is one of the
most established clustering strategies. It finds hyper planes that different classes with the goal
that clients would then be able to apply them to decide the side of the hyper plane in which to
list the data. Discriminant investigation has impediments, be that as it may.
Linear Regression is a speculation of straight regression. It is essentially utilized for
anticipating twofold factors and, less regularly, multi-class factors. Models of calculated
regression anticipate the logarithm of the chances of the events of discrete factors. The
fundamental supposition of the strategic regression show is that the logarithm of the chances
is straight in the coefficients of the indicator factors
Data Visualization
Data perception is likewise helpful for data mining. Through utilizing visual
instruments, experts can achieve a superior comprehension of the data since they can
concentrate on a portion of the examples found by other technique. Utilizing varieties of
Color, measurements, and profundity, it is conceivable to discover new affiliations and
enhance the separation between them.
4
2016),
• Cluster investigation, additionally called division.
• Discriminant investigation.
• Logistic regression.
• Time series forecasting
Cluster examination (or division) is a standout amongst the most much of the time
utilized data mining procedures; it includes isolating arrangements of data into clusters that
incorporate a progression of predictable examples. Discriminant investigation is one of the
most established clustering strategies. It finds hyper planes that different classes with the goal
that clients would then be able to apply them to decide the side of the hyper plane in which to
list the data. Discriminant investigation has impediments, be that as it may.
Linear Regression is a speculation of straight regression. It is essentially utilized for
anticipating twofold factors and, less regularly, multi-class factors. Models of calculated
regression anticipate the logarithm of the chances of the events of discrete factors. The
fundamental supposition of the strategic regression show is that the logarithm of the chances
is straight in the coefficients of the indicator factors
Data Visualization
Data perception is likewise helpful for data mining. Through utilizing visual
instruments, experts can achieve a superior comprehension of the data since they can
concentrate on a portion of the examples found by other technique. Utilizing varieties of
Color, measurements, and profundity, it is conceivable to discover new affiliations and
enhance the separation between them.
4
4 Evaluation and Demonstration
4.1 Practical – 1
This task, we will present the weka programming and download the Data set. In the
first place, customer needs to download and present the weka programming. At the point
when weka presented successfully, customer requires the open the weka programming. It is
spoken to as underneath.
5
4.1 Practical – 1
This task, we will present the weka programming and download the Data set. In the
first place, customer needs to download and present the weka programming. At the point
when weka presented successfully, customer requires the open the weka programming. It is
spoken to as underneath.
5
Go to below link to download the data set.
https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter
After, click the data folder
Then, click the Health_News_Tweets.zip file to download the data set. The download data set
is attached below.
6
https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter
After, click the data folder
Then, click the Health_News_Tweets.zip file to download the data set. The download data set
is attached below.
6
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
4.2 Practical – 2
This undertaking is used to do data and data pre-getting ready for given data
accumulation. To do the data pre-preparing on Weka by makes after the underneath strides.
To begin with, customer needs to open the weka.
To load the data set by click the explorer to select the open file.
7
This undertaking is used to do data and data pre-getting ready for given data
accumulation. To do the data pre-preparing on Weka by makes after the underneath strides.
To begin with, customer needs to open the weka.
To load the data set by click the explorer to select the open file.
7
8
If click the attributes, weka demonstrations the selected attributes virtualization.
9
9
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
4.3 Practical – 3
4.3.1 Visualising the Dataset
This task is utilized for data representation and measurement decrease. To do the data
perception on weka. On explorer windows, click envision to see the data representation. It is
represented as underneath.
10
4.3.1 Visualising the Dataset
This task is utilized for data representation and measurement decrease. To do the data
perception on weka. On explorer windows, click envision to see the data representation. It is
represented as underneath.
10
The underneath image is used to presentation the data about Response ID and receipt
name.
11
name.
11
The underneath image is used to presentation the data about recorded data and
location latitude.
12
location latitude.
12
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
4.3.2 Visualising the Dataset using Classifiers
Using the Stacking classifiers to visualizing the data set.
13
Using the Stacking classifiers to visualizing the data set.
13
=== Classifier model (full training set) ===
Stacking
Base classifiers
ZeroR predicts class value: 61.68.172.57
Meta classifier
ZeroR predicts class value: 61.68.172.57
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 1 1.2346 %
Incorrectly Classified Instances 80 98.7654 %
Kappa statistic -0.035
Mean absolute error 0.0324
Root mean squared error 0.1276
Relative absolute error 100 %
Root relative squared error 100 %
14
Stacking
Base classifiers
ZeroR predicts class value: 61.68.172.57
Meta classifier
ZeroR predicts class value: 61.68.172.57
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 1 1.2346 %
Incorrectly Classified Instances 80 98.7654 %
Kappa statistic -0.035
Mean absolute error 0.0324
Root mean squared error 0.1276
Relative absolute error 100 %
Root relative squared error 100 %
14
Total Number of Instances 81
After, choose the selected attributes to classifier subset eval.
15
After, choose the selected attributes to classifier subset eval.
15
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Results is demonstrated is below.
=== Attribute Selection on all input data ===
Search Method:
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 76
Merit of best subset found: 0
16
=== Attribute Selection on all input data ===
Search Method:
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 76
Merit of best subset found: 0
16
Attribute Subset Evaluator (supervised, Class (numeric): 3 Status):
Classifier Subset Evaluator
Learning scheme: weka.classifiers.rules.ZeroR
Scheme options:
Hold out/test set: Training data
Subset evaluation: RMSE
Data set visualization is illustrated as below.
The, choosing the search method is Ranker (Sharda, Delen & Turban, 2017),
17
Classifier Subset Evaluator
Learning scheme: weka.classifiers.rules.ZeroR
Scheme options:
Hold out/test set: Training data
Subset evaluation: RMSE
Data set visualization is illustrated as below.
The, choosing the search method is Ranker (Sharda, Delen & Turban, 2017),
17
=== Attribute selection 10 fold cross-validation seed: 1 ===
average merit average rank attribute
0 +- 0 1 +- 0 17 UserLanguage
0 +- 0 2 +- 0 16 DistributionChannel
0 +- 0 3 +- 0 7 Finished
0 +- 0 4 +- 0 6 Duration (in seconds)
0 +- 0 5 +- 0 4 IPAddress
0 +- 0 6 +- 0 3 Status
0 +- 0 7 +- 0 2 EndDate
0 +- 0 8 +- 0 8 RecordedDate
0 +- 0 9 +- 0 9 ResponseId
0 +- 0 10 +- 0 10 RecipientLastName
0 +- 0 11 +- 0 14 LocationLatitude
0 +- 0 12 +- 0 15 LocationLongitude
0 +- 0 13 +- 0 13 ExternalReference
0 +- 0 14 +- 0 11 RecipientFirstName
0 +- 0 15 +- 0 12 RecipientEmail
0 +- 0 16 +- 0 1 StartDate
18
average merit average rank attribute
0 +- 0 1 +- 0 17 UserLanguage
0 +- 0 2 +- 0 16 DistributionChannel
0 +- 0 3 +- 0 7 Finished
0 +- 0 4 +- 0 6 Duration (in seconds)
0 +- 0 5 +- 0 4 IPAddress
0 +- 0 6 +- 0 3 Status
0 +- 0 7 +- 0 2 EndDate
0 +- 0 8 +- 0 8 RecordedDate
0 +- 0 9 +- 0 9 ResponseId
0 +- 0 10 +- 0 10 RecipientLastName
0 +- 0 11 +- 0 14 LocationLatitude
0 +- 0 12 +- 0 15 LocationLongitude
0 +- 0 13 +- 0 13 ExternalReference
0 +- 0 14 +- 0 11 RecipientFirstName
0 +- 0 15 +- 0 12 RecipientEmail
0 +- 0 16 +- 0 1 StartDate
18
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
4.4 Practical – 4
This practical is used to do the clustering analysis like K-Means. This sensible isolated
into two areas, for instance, Part 1 and segment 2. The segment 1 is physically figured the K-
Means for given dataal collection. The area 2 is to use weka clustering count to figure the K-
Means. These are will be discussed and separated in detail.
4.4.1 Manually Working with K-Means
Cluster
Cluster is the errand of isolating the general population or server farms into various get-
togethers to such an extent, to the point that server farms in practically identical social affairs
are more like other server farms in a tantamount party than those in different parties. In major
words, the truth is to isolate cluster with comparable attributes and name them into social
affairs.
Cluster Analysis
Cluster Analysis has been utilized in progressing for different purposes. Division of
customers in cluster examination is utilized in perspective of points of interest hunt down
from the buy of the thing. It may be utilized to perceive homogeneous social events of
purchasers. Cluster examination fuses masterminding an issue, picking a parcel measure,
picking a social affair technique, picking the measure of cluster, deciphering the profile
cluster at long last, investigating the genuineness of accumulation.
Measure the comparability between two objects
Comparability and uniqueness are fundamental in light of the way that they are
utilized by various data mining procedures, for example, assembling closest neighbour
strategy and anomaly territory. The term district is utilized to imply either Similarity or
difference. The Similarity between two things is a numeral degree of how much the two
objects are relative. Thusly, Similarities are higher for sets of objects that are essentially more
practically identical. Similarities are frequently non-negative and are a great part of the time
between 0 (no closeness) and 1(complete equality). The dissimilarity between two things is
the numerical degree of how much the two objects are momentous. Qualification is bring
down for more comparative courses of action of things. Reliably, the term segregate is
utilized as an indistinguishable word for uniqueness. Dissimilarities once in a while fall then
[0, 1], yet it is additionally average for them to run.
19
This practical is used to do the clustering analysis like K-Means. This sensible isolated
into two areas, for instance, Part 1 and segment 2. The segment 1 is physically figured the K-
Means for given dataal collection. The area 2 is to use weka clustering count to figure the K-
Means. These are will be discussed and separated in detail.
4.4.1 Manually Working with K-Means
Cluster
Cluster is the errand of isolating the general population or server farms into various get-
togethers to such an extent, to the point that server farms in practically identical social affairs
are more like other server farms in a tantamount party than those in different parties. In major
words, the truth is to isolate cluster with comparable attributes and name them into social
affairs.
Cluster Analysis
Cluster Analysis has been utilized in progressing for different purposes. Division of
customers in cluster examination is utilized in perspective of points of interest hunt down
from the buy of the thing. It may be utilized to perceive homogeneous social events of
purchasers. Cluster examination fuses masterminding an issue, picking a parcel measure,
picking a social affair technique, picking the measure of cluster, deciphering the profile
cluster at long last, investigating the genuineness of accumulation.
Measure the comparability between two objects
Comparability and uniqueness are fundamental in light of the way that they are
utilized by various data mining procedures, for example, assembling closest neighbour
strategy and anomaly territory. The term district is utilized to imply either Similarity or
difference. The Similarity between two things is a numeral degree of how much the two
objects are relative. Thusly, Similarities are higher for sets of objects that are essentially more
practically identical. Similarities are frequently non-negative and are a great part of the time
between 0 (no closeness) and 1(complete equality). The dissimilarity between two things is
the numerical degree of how much the two objects are momentous. Qualification is bring
down for more comparative courses of action of things. Reliably, the term segregate is
utilized as an indistinguishable word for uniqueness. Dissimilarities once in a while fall then
[0, 1], yet it is additionally average for them to run.
19
Sorts of Attributes
Nominal
The estimations of a clear property are simply wonderful names, i.e. evident
characteristics give sufficiently just data to recall one test from another (=,≠). Examples:
postal area, workers ID numbers.
Ordinal
The estimations of an ordinal credit give enough data to plan objects (<, >).
Examples: Hardness of minerals, road numbers
Interval
For interim qualities, the complexities between values are important, i.e. a unit of
estimation exists (+,- ). Examples: Calendar dates, Temperature in Celsius or Fahrenheit.
Ratio
For appropriate components, the two complexities and degrees are meaningful (*,/).
Examples: Temperature in Kelvin, checks, age.
Paired Attributes
Twofold data has just 2 regards/states. For Example yes or no, affected or unaffected,
true blue or false.
Discrete
Discrete data have confined attributes it tends to be numerical and can also be in
straight out edge. These attributes has compelled or countable boundless course of action of
qualities.
Continuous
Consistent data have unending no of states. Consistent data is of buoy make. There
can be different attributes some place in the extent of 2 and 3.
Best clustering
K-Means clustering is incredible clustering technique diverged from other cluster
since K-Means is obviously the most unquestionably fathom gathering check. It's told in a
noteworthy extent of fundamental data science and machine learning classes. It's immediate
and acknowledge in code.
20
Nominal
The estimations of a clear property are simply wonderful names, i.e. evident
characteristics give sufficiently just data to recall one test from another (=,≠). Examples:
postal area, workers ID numbers.
Ordinal
The estimations of an ordinal credit give enough data to plan objects (<, >).
Examples: Hardness of minerals, road numbers
Interval
For interim qualities, the complexities between values are important, i.e. a unit of
estimation exists (+,- ). Examples: Calendar dates, Temperature in Celsius or Fahrenheit.
Ratio
For appropriate components, the two complexities and degrees are meaningful (*,/).
Examples: Temperature in Kelvin, checks, age.
Paired Attributes
Twofold data has just 2 regards/states. For Example yes or no, affected or unaffected,
true blue or false.
Discrete
Discrete data have confined attributes it tends to be numerical and can also be in
straight out edge. These attributes has compelled or countable boundless course of action of
qualities.
Continuous
Consistent data have unending no of states. Consistent data is of buoy make. There
can be different attributes some place in the extent of 2 and 3.
Best clustering
K-Means clustering is incredible clustering technique diverged from other cluster
since K-Means is obviously the most unquestionably fathom gathering check. It's told in a
noteworthy extent of fundamental data science and machine learning classes. It's immediate
and acknowledge in code.
20
Problems
Output
Individual Mean Vector (centroid)
Cluster 1 10, 11, 12, 13 (18, 42)
Cluster 2 14, 15,16,17, 18, 19,20 (20, 40)
4.4.2 Unsupervised Learning in WEKA – Clustering
Using Weka tool to do unsupervised learning by click the cluster to choose the filtered
cluster.
21
Output
Individual Mean Vector (centroid)
Cluster 1 10, 11, 12, 13 (18, 42)
Cluster 2 14, 15,16,17, 18, 19,20 (20, 40)
4.4.2 Unsupervised Learning in WEKA – Clustering
Using Weka tool to do unsupervised learning by click the cluster to choose the filtered
cluster.
21
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Final cluster centroids:
Cluster#
Attribute Full Data 0 1
(81.0) (32.0) (49.0)
==================================================================
================================
StartDate 08-11-2017 06:19 08-11-2017 06:19 19-11-2017 22:38
EndDate 08-11-2017 06:20 08-11-2017 06:20 06-11-2017 00:04
Status 0 0 0
IPAddress 61.68.172.57 58.7.190.136 120.149.96.61
Progress 96.0247 98.25 94.5714
Duration (in seconds) 10242.0617 24155.1563 1155.9592
Finished 0.9383 0.9375 0.9388
RecordedDate 08-11-2017 06:20 08-11-2017 06:20 06-11-2017 00:04
ResponseId R_3iVAurplNL0h364 R_3iVAurplNL0h364
R_3G88YIjYqcvtOqq
RecipientLastName Kaur Kaur Kaur
RecipientFirstName Soham Soham Soham
RecipientEmail s.yashwee@hotmail.com s.yashwee@hotmail.com
s.yashwee@hotmail.com
LocationLatitude -32.2704 -32.2556 -32.2801
LocationLongitude 119.919 119.2455 120.3588
DistributionChannel anonymous email anonymous
UserLanguage EN EN EN
Time taken to build model (full training data) : 0.01 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 32 ( 40%)
1 49 ( 60%)
Output visualization is illustrated as below.
22
Cluster#
Attribute Full Data 0 1
(81.0) (32.0) (49.0)
==================================================================
================================
StartDate 08-11-2017 06:19 08-11-2017 06:19 19-11-2017 22:38
EndDate 08-11-2017 06:20 08-11-2017 06:20 06-11-2017 00:04
Status 0 0 0
IPAddress 61.68.172.57 58.7.190.136 120.149.96.61
Progress 96.0247 98.25 94.5714
Duration (in seconds) 10242.0617 24155.1563 1155.9592
Finished 0.9383 0.9375 0.9388
RecordedDate 08-11-2017 06:20 08-11-2017 06:20 06-11-2017 00:04
ResponseId R_3iVAurplNL0h364 R_3iVAurplNL0h364
R_3G88YIjYqcvtOqq
RecipientLastName Kaur Kaur Kaur
RecipientFirstName Soham Soham Soham
RecipientEmail s.yashwee@hotmail.com s.yashwee@hotmail.com
s.yashwee@hotmail.com
LocationLatitude -32.2704 -32.2556 -32.2801
LocationLongitude 119.919 119.2455 120.3588
DistributionChannel anonymous email anonymous
UserLanguage EN EN EN
Time taken to build model (full training data) : 0.01 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 32 ( 40%)
1 49 ( 60%)
Output visualization is illustrated as below.
22
4.5 Practical – 5
This task is used to do supervised mining that is classification algorithm on weka. To do
classification by click the weka explorer and choose classify.
23
This task is used to do supervised mining that is classification algorithm on weka. To do
classification by click the weka explorer and choose classify.
23
Here, we are choose the naĂŻve Bayes.
=== Classifier model (full training set) ===
Dictionary size: 0
The independent frequency of a class
--------------------------------------
email 34.0
anonymous 49.0
24
=== Classifier model (full training set) ===
Dictionary size: 0
The independent frequency of a class
--------------------------------------
email 34.0
anonymous 49.0
24
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The frequency of a word given the class
-----------------------------------------
email anonymous
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4833
Root mean squared error 0.4914
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization is illustrated as below.
25
-----------------------------------------
email anonymous
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4833
Root mean squared error 0.4914
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization is illustrated as below.
25
After, choose ZeroR classifier to do the classification.
=== Classifier model (full training set) ===
ZeroR predicts class value: anonymous
26
=== Classifier model (full training set) ===
ZeroR predicts class value: anonymous
26
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4833
Root mean squared error 0.4914
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization for Margin Curve is shown below.
27
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4833
Root mean squared error 0.4914
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization for Margin Curve is shown below.
27
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
28
Visualization of threshold curve.
Visualization of cost and benefit analysis.
29
Visualization of cost and benefit analysis.
29
4.6 Practical – 6
This practical is used to do performance evaluation on weka experimenter and Knowledge
Flow. This process are illustrated as below.
4.6.1 Weka Experimenter
To do weka experimenter by follows the below steps. First, open the weka and choose the
experimenter.
After, click new to load the data set.
After, run the data. It is illustrated as below.
30
This practical is used to do performance evaluation on weka experimenter and Knowledge
Flow. This process are illustrated as below.
4.6.1 Weka Experimenter
To do weka experimenter by follows the below steps. First, open the weka and choose the
experimenter.
After, click new to load the data set.
After, run the data. It is illustrated as below.
30
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Then, analyse the data set. To analyse the data by click the perform test.
31
31
The test result is illustrated as below.
Tester: weka.experiment.PairedCorrectedTTester -R 0 -S 0.05 -result-matrix
"weka.experiment.ResultMatrixPlainText -mean-prec 2 -stddev-prec 2 -col-name-width 0 -
row-name-width 25 -mean-width 0 -stddev-width 0 -sig-width 0 -count-width 5 -print-col-
names -print-row-names -enum-col-names"
Analysing: region-centroid-col
Datasets: 1
Resultsets: 1
Confidence: 0.05 (two tailed)
Sorted by: -
Date: 24/09/18, 5:36 PM
Dataset (1)
-----------------------------------------
(1500 125.20 |
-----------------------------------------
(v/ /*) |
Key:
(1)
32
Tester: weka.experiment.PairedCorrectedTTester -R 0 -S 0.05 -result-matrix
"weka.experiment.ResultMatrixPlainText -mean-prec 2 -stddev-prec 2 -col-name-width 0 -
row-name-width 25 -mean-width 0 -stddev-width 0 -sig-width 0 -count-width 5 -print-col-
names -print-row-names -enum-col-names"
Analysing: region-centroid-col
Datasets: 1
Resultsets: 1
Confidence: 0.05 (two tailed)
Sorted by: -
Date: 24/09/18, 5:36 PM
Dataset (1)
-----------------------------------------
(1500 125.20 |
-----------------------------------------
(v/ /*) |
Key:
(1)
32
After, click the cols to analyse the data. I
33
33
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
4.6.2 Weka Knowledge Flow
To do weka knowledge flow by using the below steps.
First open the weka and choose the knowledge flow.
Click the data sources and choose the ARFF loader.
And, click the evaluation to select the Class assigner, class value picker and cross validation
fold maker.
After, click the classifiers to choose naĂŻve Bayes and random forest.
34
To do weka knowledge flow by using the below steps.
First open the weka and choose the knowledge flow.
Click the data sources and choose the ARFF loader.
And, click the evaluation to select the Class assigner, class value picker and cross validation
fold maker.
After, click the classifiers to choose naĂŻve Bayes and random forest.
34
Also click the classifier evaluation to choose the classifier performance evaluators used for
two classifiers.
Then, connect the data set for all the data sources.
35
two classifiers.
Then, connect the data set for all the data sources.
35
After, configure the arff data set.
36
36
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Configure the class assigner.
Configure the class value picker.
37
Configure the class value picker.
37
Configure class validation fold maker.
Finally, load the data to click the start button.
38
Finally, load the data to click the start button.
38
39
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
4.7 Practical – 7
This practical is used for predicting the time series on weka package manager. To do
predicting time series on package manager by follows the below steps.
Open weka.
Click tools to choose the package manager.
Here, choose the time series forecasting.
40
This practical is used for predicting the time series on weka package manager. To do
predicting time series on package manager by follows the below steps.
Open weka.
Click tools to choose the package manager.
Here, choose the time series forecasting.
40
After, click install to install the time series forecasting.
41
41
Installing process is shown as below.
After, open the weka explorer and click the forecast. Then, choose the attributes to predicting
the time series.
42
After, open the weka explorer and click the forecast. Then, choose the attributes to predicting
the time series.
42
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Transformed training data:
Status
ArtificialTimeIndex
Lag_Status-1
Lag_Status-2
Lag_Status-3
Lag_Status-4
Lag_Status-5
Lag_Status-6
Lag_Status-7
Lag_Status-8
Lag_Status-9
Lag_Status-10
Lag_Status-11
Lag_Status-12
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_Status-1
ArtificialTimeIndex*Lag_Status-2
ArtificialTimeIndex*Lag_Status-3
ArtificialTimeIndex*Lag_Status-4
43
Status
ArtificialTimeIndex
Lag_Status-1
Lag_Status-2
Lag_Status-3
Lag_Status-4
Lag_Status-5
Lag_Status-6
Lag_Status-7
Lag_Status-8
Lag_Status-9
Lag_Status-10
Lag_Status-11
Lag_Status-12
ArtificialTimeIndex^2
ArtificialTimeIndex^3
ArtificialTimeIndex*Lag_Status-1
ArtificialTimeIndex*Lag_Status-2
ArtificialTimeIndex*Lag_Status-3
ArtificialTimeIndex*Lag_Status-4
43
ArtificialTimeIndex*Lag_Status-5
ArtificialTimeIndex*Lag_Status-6
ArtificialTimeIndex*Lag_Status-7
ArtificialTimeIndex*Lag_Status-8
ArtificialTimeIndex*Lag_Status-9
ArtificialTimeIndex*Lag_Status-10
ArtificialTimeIndex*Lag_Status-11
ArtificialTimeIndex*Lag_Status-12
Status:
Linear Regression Model
Status =
0 * ArtificialTimeIndex +
0 * ArtificialTimeIndex^2 +
0 * ArtificialTimeIndex^3 +
0
=== Predictions for test data: Status (1-step ahead) ===
=== Evaluation on test data ===
Target 1-step-ahead
========================================
Status
N 24
Mean absolute error 0
Root mean squared error 0
Mean squared error 0
Total number of instances: 24
Future prediction is illustrated as below.
44
ArtificialTimeIndex*Lag_Status-6
ArtificialTimeIndex*Lag_Status-7
ArtificialTimeIndex*Lag_Status-8
ArtificialTimeIndex*Lag_Status-9
ArtificialTimeIndex*Lag_Status-10
ArtificialTimeIndex*Lag_Status-11
ArtificialTimeIndex*Lag_Status-12
Status:
Linear Regression Model
Status =
0 * ArtificialTimeIndex +
0 * ArtificialTimeIndex^2 +
0 * ArtificialTimeIndex^3 +
0
=== Predictions for test data: Status (1-step ahead) ===
=== Evaluation on test data ===
Target 1-step-ahead
========================================
Status
N 24
Mean absolute error 0
Root mean squared error 0
Mean squared error 0
Total number of instances: 24
Future prediction is illustrated as below.
44
After, choose the classifier as random forest.
45
45
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
=== Evaluation on test data ===
Target 1-step-ahead
========================================
Status
N 24
Mean absolute error 0
Root mean squared error 0
Mean squared error 0
Total number of instances: 24
Future prediction is illustrated as below.
46
Target 1-step-ahead
========================================
Status
N 24
Mean absolute error 0
Root mean squared error 0
Mean squared error 0
Total number of instances: 24
Future prediction is illustrated as below.
46
4.8 Practical – 8
This task is used to do text mining by using the below steps.
4.8.1 Training the Classifier Model
First, choose the unsupervised attributes as string to word vector.
Results is illustrated as below.
47
This task is used to do text mining by using the below steps.
4.8.1 Training the Classifier Model
First, choose the unsupervised attributes as string to word vector.
Results is illustrated as below.
47
After, click classifier to choose the stack classifier.
Stacking
Base classifiers
ZeroR predicts class value: 96.0246913580247
Meta classifier
ZeroR predicts class value: 96.0246913580247
Time taken to build model: 0 seconds
=== Cross-validation ===
=== Summary ===
48
Stacking
Base classifiers
ZeroR predicts class value: 96.0246913580247
Meta classifier
ZeroR predicts class value: 96.0246913580247
Time taken to build model: 0 seconds
=== Cross-validation ===
=== Summary ===
48
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Correlation coefficient -0.2912
Mean absolute error 7.5502
Root mean squared error 17.7457
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
4.8.2 Predict the Class in Test
Again, click classifier to choose the filtered classifier.
Output is illustrated as below.
49
Mean absolute error 7.5502
Root mean squared error 17.7457
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 81
4.8.2 Predict the Class in Test
Again, click classifier to choose the filtered classifier.
Output is illustrated as below.
49
Classifier Model
J48 pruned tree
------------------
: anonymous (81.0/33.0)
Number of Leaves : 1
Size of the tree : 1
Time taken to build model: 0.06 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4829
Root mean squared error 0.4914
Relative absolute error 99.9145 %
Root relative squared error 99.999 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization is illustrated as below.
50
J48 pruned tree
------------------
: anonymous (81.0/33.0)
Number of Leaves : 1
Size of the tree : 1
Time taken to build model: 0.06 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 48 59.2593 %
Incorrectly Classified Instances 33 40.7407 %
Kappa statistic 0
Mean absolute error 0.4829
Root mean squared error 0.4914
Relative absolute error 99.9145 %
Root relative squared error 99.999 %
Total Number of Instances 81
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.000 0.000 ? 0.000 ? ? 0.500 0.407 email
1.000 1.000 0.593 1.000 0.744 ? 0.500 0.593 anonymous
Weighted Avg. 0.593 0.593 ? 0.593 ? ? 0.500 0.517
=== Confusion Matrix ===
a b <-- classified as
0 33 | a = email
0 48 | b = anonymous
Visualization is illustrated as below.
50
4.9 Practical – 9
This practical is used to do the image analytics on weka. To do image analytics on the weka
by using the below steps.
Open weka.
Click tools to choose the package manager.
51
This practical is used to do the image analytics on weka. To do image analytics on the weka
by using the below steps.
Open weka.
Click tools to choose the package manager.
51
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Choose image filters and click the install to install the image filters packages.
After, open weka explorer and click the attributes to choose the unsupervised to select the
image filters.
52
After, open weka explorer and click the attributes to choose the unsupervised to select the
image filters.
52
After, choose randomize image filters.
53
53
54
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
After, click classify to choose the stack classifier.
5 Conclusion
This project successfully investigated the wellbeing news dataset to explore the weka
data mining applications. This undertaking assigned in to 10 practical. In First, we will
present the weka programming and download the Data file. In second, this practical is
successfully completed data and data pre-planning for gave dataset. In third, this task is
successfully completed data perception and measurement decrease. In fourth, this task is
successfully completed do the clustering calculation like K-Means. This undertaking isolated
into two sections, for example, Part 1 and section 2. The section 1 is physically figured the K-
Means for gave data directory. The section 2 is to utilize weka clustering calculation to
compute the K-Means. In Fifth, this assignment is successfully completed data mining that is
classification algorithm on weka. In 6th rational, this valuable is successfully completed
execution appraisal on weka experimenter and Knowledge Flow. In seventh, this task
successfully completed foreseeing the time arrangement on weka package administrator. In
eighth, this task is successfully completed content mining. In conclusive, this assignment is
55
5 Conclusion
This project successfully investigated the wellbeing news dataset to explore the weka
data mining applications. This undertaking assigned in to 10 practical. In First, we will
present the weka programming and download the Data file. In second, this practical is
successfully completed data and data pre-planning for gave dataset. In third, this task is
successfully completed data perception and measurement decrease. In fourth, this task is
successfully completed do the clustering calculation like K-Means. This undertaking isolated
into two sections, for example, Part 1 and section 2. The section 1 is physically figured the K-
Means for gave data directory. The section 2 is to utilize weka clustering calculation to
compute the K-Means. In Fifth, this assignment is successfully completed data mining that is
classification algorithm on weka. In 6th rational, this valuable is successfully completed
execution appraisal on weka experimenter and Knowledge Flow. In seventh, this task
successfully completed foreseeing the time arrangement on weka package administrator. In
eighth, this task is successfully completed content mining. In conclusive, this assignment is
55
successfully completed the image analysis on weka. These are broke down and examined in
detail.
56
detail.
56
References
Sharda, R., Delen, D., & Turban, E. (2017). Business Intelligence. Pearson Australia Pty Ltd.
Sherif, A. (2016). Practical business intelligence. [Place of publication not identified]: Packt
Publishing Limited.
57
Sharda, R., Delen, D., & Turban, E. (2017). Business Intelligence. Pearson Australia Pty Ltd.
Sherif, A. (2016). Practical business intelligence. [Place of publication not identified]: Packt
Publishing Limited.
57
1 out of 58
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
 +13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024  |  Zucol Services PVT LTD  |  All rights reserved.