Data Mining: WEKA Toolkit Analysis and Classification Report

Verified

Added on  2022/09/05

|22
|1558
|14
Report
AI Summary
This report presents a comprehensive analysis of a text dataset using the WEKA data mining toolkit. The assignment involves identifying keywords, converting the dataset into ARFF format, and applying pre-processing techniques. The student explores the impact of these techniques on the data. Furthermore, the report evaluates the performance of decision-tree (J48), Naïve Bayes, and Support Vector Machine classifiers, generating tables and graphs to compare their performance against varying training set sizes. The report details the conversion techniques, pre-processing steps, and the rationale behind the choices made, providing insights into the practical application of data mining methodologies and the utility of the WEKA toolkit. The report also includes a conclusion summarizing the findings and the effectiveness of the methods used, along with a reference list.
Document Page
Running head: REPORT ON DATA MINING
By
Academic Year: 2019-20
Module: Data Mining
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1
Introduction
Data Mining (DM)techniques idea is to remove shrouded design and find relationships
between parameters in a huge measure of data. There are numerous accomplishments of use
of DM procedures to numerous territories, for example, building, instruction, promoting,
clinical, budgetary, and sport. It shows the DM technique's ability in giving the elective
answer for leaders in taking care of issues that emerge specifically areas. The investigation
data in the educational field utilizing DM techniques called as Educational Data Mining
(EDM). EDM is worried about extricating an example to find concealed data from the
instructive database and utilized it for dynamic in the instructive framework. To find
concealed examples from instructive databases utilizing DM systems, the appropriate
apparatus is required(Ilic et al., 2016). These days, various accessible apparatuses for
DMprocess keeps on developing and the specialists have numerous options in choosing an
appropriate device for their inquires about. The apparatus is chosen dependent on specific
criteria, for example, instrument stages constructed, the parameter utilized, and the DM
technique utilized in their research. The DM devices can be separated into two classes which
are open source/non-business programming and business programming(Arganda-Carreras et
al., 2017).
Question 1
5 Key Words
Laws
Homicide
Document Page
2
Control
England
Gun
Other 5 Key Words
Urban
Data
Media
Destruction
Security
Word recurrence comprises posting the words and expressions that most generally show up
inside a container. This can be extremely helpful for a bunch of purposes, from distinguishing
repetitive terms in a lot of item surveys, to discovering what are the most well-known issues
in client care cooperations(Siddiqui, and Abidi, 2018).
Question Two
Join both datasets with a content tool, Load the consolidated dataset in WEKA and
rearranged it, at that point utilizing WEKA, remove/spare a subsample as your new preparing
set and concentrate/spare another subsample as your new approval informational collection.
Presently the two datasets ought to have the same qualities with the same request. Direct your
Document Page
3
tests in the new datasets. You can have two methodologies, first it to consider every last one
of your yields once with the entire sources of info, and the other is to utilize classifier, which
can have different yields, which is fundamentally the capacity of any classifier with
legitimate demonstrating. Neural Nets and KNN are two cases of the classifiers having this
capacity and simple to utilize(Kulkarni, and Kulkarni, 2016).
Question 3
Pre-processing
More often than not, the information wouldn't be great, and we would need to do pre-
handling before applying AI calculations on it. Doing pre-preparing is simple in Weka. You
can basically tap the "Open document" catch and burden your record as certain record types:
Arff, CSV, C4.5, double, LIBSVM, XRFF; you can likewise stack SQL DB record through
the URL and afterward you can apply channels to it. Snap the "Open record" button from the
Pre-process area and burden your .arff document from your nearby document framework. On
the off chance that you were unable to change over your .csv to .arff, don't stress, since Weka
will do that rather than you(Thailambal, Subramani, and Saradha, 2018).
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
4
On the off chance that you could follow all the means up until now, you can stack your
informational collection effectively and you'll see characteristic names (it is outlined at the
red territory on the above pictures). The pre-process organize is named as Filter in Weka, you
can tap the 'Pick' button from Filter and apply any channel you need. For instance, on the off
chance that you might want to utilize Association Rule Mining is a preparation model, you
need to separate numeric and ceaseless characteristics. To have the option to do that you can
follow the way: Choose - > Filter - > Supervised - > Attribute - > Discritize
Impact the pre-processing has on the data
Document Page
5
The idea of order is fundamentally appropriate information among the different classes
characterized by an informational collection. Order calculations take in this type of
dissemination from a given arrangement of preparing and afterward attempt to group it
effectively with regards to test information for which the class isn't indicated. The qualities
that indicate these classes on the dataset are given a mark name and are utilized to decide the
class of information to be given during the test(Siddiqui, and Abidi, 2018).
Question 4
Decision-tree (J48)
Database 1
Table
Document Page
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7
Graph
Document Page
8
Database 2
Table
Document Page
9
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
10
Graph
Naïve
Bayes
Database 1
Table
Document Page
11
Graph
Document Page
12
Database 2
Table
Graph
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
13
Support Vector Machine
Document Page
14
Database 1
Table
Document Page
15
Graph
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
16
Database 2
Table
Document Page
17
Graph
Document Page
18
Question 5
Keywords
Political
Politics
Political action
Government
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
19
The keywords are not the same as the one I picked manually from the text editor
Conclusion
The WEKA venture has made some amazing progress in the 16 years that have passed since
its commencement in 1992. The success has appreciated is a demonstration of the energy of
its community and numerous benefactors. Discharging WEKA as open-source software and
executing it in Java has played no small part in its prosperity. These two components
guarantee that it remained maintainable and modifiable independent of the commitment
strength of a specific establishment or organization. The way toward making a choice tree
works by ravenously choosing the best part point so as to make forecasts and rehashing the
procedure until the tree is a fixed profundity. Bolster Vector Machines were produced for
double characterization issues, in spite of the fact that expansions to the method have been
made to help multi-class arrangement and relapse issues. The calculation is regularly alluded
to as SVM for short. SVM was produced for numerical info factors, in spite of the fact that it
will naturally change over ostensible qualities to numerical qualities. Info information is
likewise standardized before being utilized. SVM work by finding a line that best isolates the
information into the two gatherings. This is finished utilizing a streamlining procedure that
lone considers those information examples in the preparation dataset that is nearest to the line
that best isolates the classes. The examples are called bolster vectors, thus the name of the
system. In practically all issues of intrigue, a line can't be attracted to conveniently isolate the
classes, in this manner an edge is added around the line to loosen up the limitation, permitting
a few examples to be misclassified however permitting a superior outcome by and large. At
long last, few datasets can be isolated with only a straight line. At times a line with bends or
even polygonal locales should be set apart out. This is accomplished with SVM by
Document Page
20
anticipating the information into a higher dimensional space so as to draw the lines and make
forecasts. Various bits can be utilized to control the projection and the measure of
adaptability in isolating the classes. Innocent Bayes is a basic probabilistic classifier that
assesses a lot of probabilities by figuring the recurrence and courses of action of significant
worth in a given dataset. NB is utilized for deciding the likelihood of another component that
has just happened utilizing the Bayesian hypothesis. Choice Tree Algorithm, a well known
AI of the grouping procedures, depends on J.R. Quilan C4.5. The choice tree makes a twofold
tree. This strategy recursively isolates perception in branches to develop a tree to improve the
expected precision. All information analyzed will be of the all-out sort.
Document Page
21
References
Arganda-Carreras, I., Kaynig, V., Rueden, C., Eliceiri, K.W., Schindelin, J., Cardona, A. and
Sebastian Seung, H., 2017. Trainable Weka Segmentation: a machine learning tool for
microscopy pixel classification. Bioinformatics, 33(15), pp.2424-2426.
Thailambal, G., Subramani, R. and Saradha, S., 2018. DRUGS USAGE PREDICTION IN
WEKA TOOL USING C4. 5 CLASSIFICATION ALGORITHM. International Journal of
Pure and Applied Mathematics, 119(15), pp.3633-3642.
Siddiqui, M.S., and Abidi, A.I., 2018. Comparative study of different classification
techniques using the weka tool. Global Sci-Tech, 10(4), pp.200-208.
Kulkarni, E.G. and Kulkarni, R.B., 2016. Weka powerful tool in data mining. International
Journal of Computer Applications, 975, p.8887.
Ilic, M., Spalevic, P., Veinovic, M. and Alatresh, W.S., 2016. Students' success prediction
using the Weka tool. Infotech-Jahorina, 15, pp.684-688.
chevron_up_icon
1 out of 22
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]