Weka Assignment: Data Exploration and Visualization Techniques

Verified

Added on  2022/09/17

|11
|517
|35
Practical Assignment
AI Summary
This assignment solution demonstrates data exploration and analysis using the Weka workbench. It begins by loading and analyzing the diabetes dataset, determining the number of instances, attributes, and classes, and uses histograms to visualize age group distributions. The solution then involves creating a new .arff file from the iris dataset, removing the 'petal_width' attribute, and exploring the modified dataset. Histograms are generated for each attribute of the iris dataset. Finally, the solution performs visual analysis using scatter plots to compare the distributions of sepal length versus sepal width and sepal length versus petal length, highlighting the class separation capabilities of the scatter plots.
Document Page
University
Semester
Statistics
Student ID
Student Name
Submission Date
1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table of Contents
Task 1: Weka Data Exploration...............................................................................................................3
Task 2: Working with a new data file in Weka......................................................................................5
Task 3: Visual Analysis............................................................................................................................8
References................................................................................................................................................11
2
Document Page
Task 1: Weka Data Exploration
In this task, we are using the weka work bench and load the diabetes dataset to answer the
following questions such as,
How many instances and attributes on the Diabetes dataset have?
How many classes are presented in the diabetes datasets and how many instances are
there for each class?
Use histograms to display the which age group has the highest number of samples.
First, we are are load the diabetes data set on Weka work bench which is demostrated as
below (Veart, 2013).
The diabetes dataset have the 768 instances and 9 attributes which is demonstrated as
below.
3
Document Page
The diabetes dataset have the one classes with 0 instances which is demonstrated as below.
4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
The below histogram graph is shows the age attributes information.
Task 2: Working with a new data file in Weka
In this task, we are using the iris dataset and it load it on Weka workbench which is
demonstrated as below (Cs.waikato.ac.nz, 2019).
After, we will remove the petal width attributes and save it as iris.3D.arff which is demonstrated
as below.
5
Document Page
After, load the iris.3D data set which is demonstrated as below.
6
Document Page
Then, we will use the histograms for each attributes in this iris data set which is
illustrated as below.
7
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Task 3: Visual Analysis
In this task, we are use the iris.3D dataset and it is load into Weka workbench which is
illustrated as below.
After, generate the scatter plot by click visualize on the menu option which is used to
shows the data distribution on the each two attributes in to a two dimensional visualization
which is illustrated as below.
8
Document Page
Then, we will visually compare the scatter plot for sepal length versus sepal width and
sepal length versus petal length. These are represented as below.
The below scatter plot is used to shows the differences between the Sepal length and
Sepal width.
9
Document Page
The below scatter plot is used to shows the differences between the Sepal length and
Petal length.
Based on two scatter plots, the sepal length versus petal length graph is clearly display
the better class separate ability because it separate the classes based on colors like blue as iris
serosa, red as iris versicolor and green as iris virginica. These are clearly showed on the scatter
plots.
10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
References
Cs.waikato.ac.nz. (2019). Weka 3 - Data Mining with Open Source Machine Learning Software
in Java. [online] Available at: https://www.cs.waikato.ac.nz/ml/weka/ [Accessed 21 Aug.
2019].
Veart, D. (2013). First, Catch Your Weka. New York: Auckland University Press.
11
chevron_up_icon
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon