ITC 516 Data Mining: Performance Analysis of Algorithms Report

Verified

Added on 2024/05/31

AI Summary

This report provides a comprehensive analysis of data mining techniques, specifically focusing on the application and comparison of Decision Tree, Naive Bayes, and K-Nearest Neighbor (KNN) algorithms using the Weka software. The analysis is performed on a soybean dataset (ARFF format) to evaluate the performance of each algorithm based on metrics such as accuracy, precision, and confusion matrix analysis. The report details the steps involved in data loading, algorithm execution, and result interpretation within the Weka environment. Furthermore, it discusses the strengths and weaknesses of each algorithm, considering factors like computational cost, scalability, and sensitivity to data characteristics, ultimately concluding on the suitability of each algorithm for different data mining tasks. The document is contributed by a student and is available on Desklib, a platform offering study tools and resources for students.

ITC 516
DATA MINING AND VISUALISATION FOR BUSINESS
INTELLIGENCE
ASSIGNMENT 3
Student Name: Gurwinder Singh
Student ID:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
INTRODUCTION................................................................................................................................3
Task 1: DATA MINING TASK.............................................................................................................4
DECISION TREE.............................................................................................................................6
NAÏVE BAYES................................................................................................................................8
K-NEAREST NEIGHBOUR.............................................................................................................10
Task 2.............................................................................................................................................12
k-NN vs Naïve Bayes vs Decision Tree........................................................................................12
Algorithm Performance.............................................................................................................13
Conclusion......................................................................................................................................14
Reference.......................................................................................................................................15
List of table
Table 1: Analysis of k-NN, Naive Bayes and Decision Tree............................................................12
List of Figures
Figure 1: Weka Index Page..............................................................................................................4
Figure 2: Next Page..........................................................................................................................4
Figure 3: soybean.arff file................................................................................................................5
Figure 4: Soybean.arff file loaded in Weka for analysis..................................................................5
Figure 5: Accuracy of the Class using Decision Tree........................................................................6
Figure 6: Confusion Matrix by decision Tree Analysis.....................................................................6
Figure 7: Accuracy of the Class using Naive Bayes..........................................................................8
Figure 8: Confusion Matrix by Naive Bayes Analysis.......................................................................9
Figure 9: Accuracy of the Class using KNN Classifier.....................................................................10
Figure 10: Confusion Matrix by KNN Classifier Analysis................................................................11

INTRODUCTION
Data Mining is a process that uses complicated data to order to gain some insight over that
Data using some complex Data Analytics measures to discover patterns that are known or
unknown. There are various tools that are to be used for the data analysis and processing phase
and can help in creating a better data analytics approach for finding a better understanding of
that data. To implement any Dataset a Knowledge Base is used that knowledge base is going to
help in making a better prediction task and help in analysing the data in a much better way
(Jadhav & Channe, 2016).
The aim of this report is to analyse the Business Requirements for the pattern identification.
This report is going to be focused on the different data mining problems that can help in
comparing the output pattern. In this report, critical analysis has been done for the data set
that is provided to analyse the data that is provided. There are several patterns that have been
analysed by the use of Weka software. Further, the Weka Software is going to provide the
better insight over the dataset.
The Dataset that is used in this analysis report is an ARFF data ARFF stands for Attribute-
Relation File Format that is a file format used by ASCII text files. It includes a list of all the
instances of the attributes. It is used in Weka Software for the Machine Learning Projects. This
report is going to be focused on three data classification algorithms and using the analysis
reports form them finding out which one is best in order to find out the better processing.

Task 1: DATA MINING TASK
For this Analysis task following processing is done.
1. Run Weka (For this analysis Weka V3.8 is used)
Figure 1: Weka Index Page
2. Click on Open File tab and search for soybean.arff file and open it
Figure 2: Next Page

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Figure 3: soybean.arff file
3. The data will load into the Weka
Figure 4: Soybean.arff file loaded in Weka for analysis

DECISION TREE
Decision tree is a Machine Learning Classifier that is used for analysing the data by visually or
Explicitly creating a decision tree that could help in making the decision. A decision tree is
drawn from upside down allowing the root node to be at the top and other internal nodes
follow by splitting the tree in edges and leaf nodes. The Decision tree is going to help in making
a tree-like structure that could help in creating a decision model for the Soybean dataset
(Barros, Basgalupp, de Carvalho & Quiles, 2012).
Figure 5: Accuracy of the Class using Decision Tree
Figure 3 is the Accuracy rate of the Whole Dataset by the Decision Tree by analysing this figure
the Precision comes out to be 0.917 that means the data is highly precise regarding this
dataset.
Figure 6: Confusion Matrix by decision Tree Analysis

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Confusion Matrix Can help in finding out the method by which all the data can be linked. Figure
6 defines the Confusion Matrix that is going to help in making a prediction analysis over the
Soybean dataset.

NAÏVE BAYES
Naïve Bayes Classifier Technique is simple classifier based on Bayes theorem it has an
independence assumption schema that can help in classification in much better and efficient
manner. The scalability in the Naïve Bayes is high and that requires a number of predictors for
creating a learning problem (Xiang, Yu & Kang, 2015).
Figure 7: Accuracy of the Class using Naive Bayes
Figure 7 shows the Accuracy rate of the Whole Dataset using the Naïve Bayes Classifier by
analysing this figure the Precision comes out to be 0.938 that means the data is highly precise
regarding this dataset and Naive Bayes can easily classify the dataset according to the different
attributes present within.

Figure 8: Confusion Matrix by Naive Bayes Analysis
The Confusion Matrix helps in understanding all the attributes that are present in this soybean
data and help in defining the relationship between them. The aim of a confusion matrix is to
show how the other attributes will be related to each other. The diagonal of the matrix shows
the relationship between the attributes with other attributes

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

K-NEAREST NEIGHBOUR
K-Nearest Neighbour is also known as k-NN algorithm which is a Pattern Recognition Algorithm
use to depict the patterns in the data and help in classifying and applying regression over that
set of data. It produces Class Membership as an output. The input in this algorithm is the k
training closest examples that are used by this classifier as a feature space in the data (Yu,
Zhang, Huang & Xiong, 2009).
Figure 9: Accuracy of the Class using KNN Classifier
Figure 9 shows the Accuracy rate of the Whole Dataset using the K-Nearest Neighbour Classifier
by analysing this figure the Precision comes out to be 0.915 that means the data is highly
precise regarding this dataset but the performance of Naive Bayes is much more than the K-NN
Classifier. Further, this is a time-consuming process by which the performance is decreased to
this limit.

Figure 10: Confusion Matrix by KNN Classifier Analysis
Confusion matrix helps in making the prediction based on the diagonal elements of the
attributes. Figure 10 shows the Confusion Matrix of the all those attributes present within the
overall attributes list and could help in making a better predictive modelling over the dataset
present.

1 out of 16

ITC 516 Data Mining: Performance Analysis of Algorithms Report

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Related Documents

A Study on Text Classification Using Naïve Bayes in Natural Language

Comparative Study of KNN, J48, and Naive Bayes in Weka Environment

Restaurant Data Analysis using Naïve Bayes: A Data Mining Approach

Data Mining Report: Classification Algorithms Performance Analysis

Data Mining and Visualization for Business Intelligence Report

AI-Powered Heart Disease Prediction: Exploring Data Mining Algorithms

+13062052269

info@desklib.com

ITC 516 Data Mining: Performance Analysis of Algorithms Report

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

A Study on Text Classification Using Naïve Bayes in Natural Language

Comparative Study of KNN, J48, and Naive Bayes in Weka Environment

Restaurant Data Analysis using Naïve Bayes: A Data Mining Approach

Data Mining Report: Classification Algorithms Performance Analysis

Data Mining and Visualization for Business Intelligence Report

AI-Powered Heart Disease Prediction: Exploring Data Mining Algorithms

+13062052269

info@desklib.com