SIT717 Enterprise Business Intelligence: Weka Data Analysis Report

Verified

Added on 2022/11/14

AI Summary

This report details a project focused on analyzing a dataset using machine learning and data mining techniques within the Weka application. The project follows a structured approach, starting with dataset download and import, followed by data pre-processing and feature reduction. The report then explores various data mining methods including K-means clustering, classification algorithms, and the Weka Experimenter and Knowledge Flow functionalities for evaluation and analysis. The practical aspects involve visualizing datasets, implementing classifiers, and comparing different data mining techniques. The report also covers the understanding of the dataset, data mining techniques, and experimental evaluations, including the application of classifiers and clustering algorithms. The student implements different practicals to understand the implementation of various data mining techniques to analyze the provided dataset. The report concludes with an analysis of the results and a discussion of the findings. The report is a comprehensive technical analysis of the application of data mining and machine learning in a business intelligence context.

Enterprise Business intelligence

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Abstract
The main aim of this project analysis to b implement machine learning and data mining
method to analyze the data processing on the dataset to be used on a business learning system
application disclosure using weka implementation. The central concept of this assignment
they use learning on the data mining dataset analyzing methods and working with weka data
mining applications. The data processing on the weka application can use for ten different
practical files set to be learning and explained type. The performance on the Initial stages is
used for downloading the dataset on the website, after that importing on the dataset in weka
application they can perform on the data per processing for the presented dataset. In the third
step, Perception of the dataset decrease measurement to be analyzed, the fourth step is to
evaluate the K-means clustering on the data mining. In, the fifth step implementing the
clustering algorithm they can use for the data analyzing the process. In, sixth steps to be
processing on the knowledge flow they can use for execution evaluation process on the weka
experimented. In the seventh step, utilizing the weka package administer to be managed on
the time arrangement. In eight they can use the technique on content-based filtering on the
data mining of the practical assignment to be implemented in details below,
Table of Contents

Introduction.........................................................................................................................................3
Understanding dataset........................................................................................................................3
Data mining techniques.......................................................................................................................3
Experimental evaluation.....................................................................................................................5
Practical-1........................................................................................................................................5
Practical-2........................................................................................................................................7
Practical-3........................................................................................................................................9
Visualizing dataset.......................................................................................................................9
Visualising Dataset using Classifiers........................................................................................11
Practical -4.....................................................................................................................................18
Manual working with k-Means.................................................................................................18
Unsupervised Learning in WEKA –clustering........................................................................20
Practical -5.....................................................................................................................................22
Practical -6.....................................................................................................................................29
Weka Experimenter...................................................................................................................29
Weka knowledge Flow...............................................................................................................32
Practical -7.....................................................................................................................................38
Practical -8.....................................................................................................................................45
Classifier training model...............................................................................................................45
Test prediction classes...................................................................................................................47
Practical -9.....................................................................................................................................49
Conclusion..........................................................................................................................................54
Reference............................................................................................................................................54

Introduction
In this task, planning on the machine learning and data mining analyzing techniques are used
to implement on the considerable amount of data processing dataset analyzed using weka
application implementation. The implementation of the data analyzing follows the 10 stages
of the different data mining techniques that are included in the downloaded dataset, second is
import the data from the weka application and performed the data pre-processing techniques,
third, identifie decrease measurement on the data mining techniques, In 4th, k-means
clustering implementation on the K-implies and k-means sections. In the fifth, to be
generating on the administer classification algorithm on the data mining process
implementation on weka. In six steps is to use the experiment on the knowledge flow
processing to be used for valuable analysis description. In seventh, used for the time
arrangement forecasting package administrator, in eight stages, they can for content filtering
mining techniques implementation be investigated.
Understanding dataset
Each record is related to one Twitter record of a new office. For example, bbchealth.txt is
related to BBC flourishing news (Patel, 2017). Each line contains tweet id | date and time |
tweet. The separator is '|'. This substance information has been used to survey the execution
of point models on short substance information. In any case, it might be used for various
assignments, for instance, bunching (Olson, 2016).
Data mining techniques
Various methods are used in data processing, although they cannot be combined with a wide
range of data. For example, neural system calculations can be used to quantify data
(numerical data), although they may not qualify the information correctly (unpublished data);
in this manner, clear-cut data are generally divided into several dichotomous factors(Arabnia

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

et al., 2013), each with values of 1 ("yes") or 0 ("no"). Part of conventional measuring
techniques can be used for the accompanying data mining.
 Investigation of data mining division clustering
 Investigate the Discriminant
 Logical Regression
 Forecasting time series
Group assessment (or division) is a champion among the most a significant part of the time
used information mining techniques (Deshpande and Kumar, 2017); it incorporates secluding
courses of action of information into bunches that consolidate a movement of unsurprising
models (Eliot, 2018). A discriminate examination is one of the most settled grouping
techniques. It discovers hyperplanes that various class with the objective that customers
would then have the option to apply them to choose the side of the hyperplane in which to list
the information. The discriminate examination has obstructions, in any case (Eliot, 2017).
Logical Regression is a theory of straight relapse. It is used for foreseeing twofold factors
and, less consistently, multi-class elements. Models of determined decline predict the
logarithm of the odds of the occasions of discrete components. The essential supposition of
the critical relapse show is that the logarithm of the odds is straight in the coefficients of the
marker factors on the data mining on machine learning implementation.
Data visualization
Information observation is moreover useful for information mining. Through using visual
instruments, specialists can accomplish a prevalent cognizance of the information since they
can focus on a segment of the models found by another method (Rochester, 2014). Using
assortments of Color, estimations, and significance, it is possible to see new affiliations and
improve the partition between them.

Experimental evaluation
Practical-1
In this section, the experimental evaluation is used to download the dataset and which
presenting on the weka programming learning implementation (Trivedi, 2014). If the user
that can open the data on the weka application and the dataset has been successfully present

on the weka programming on GUI interface.
Go to the dataset download links that are http://mlr.cs.umass.edu/ml/
Search the tweets data (Gollapudi, 2013), and go the new web pages, and click on, UCI
machine learning repository: health News in Twitter Data set” after that click on data folder,
and download, health-news-Tweets.zip
Health_News_Tweets.zip files to download the data set that can show in below,

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Practical-2
The data analyzing on the data mining and machine learning techniques they can use for the
initially pre-processing on the data accumulation using weka(Gollapudi and Laxmikanth,
n.d.), open the weka explorer and import the dataset on "bbchealth.txt."
Click on the open file and load the dataset,

View the list of attributes visualization that can show in below(Han, Kamber and Pei, 2012),

Practical-3
Visualizing dataset
This section implements the decreased measurement on the data analyzing visualization to be
learned on the weka. On explorer’s browsers, click on the data visualized representation they
can show in below,

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The presentation of the undertaken image they can processing on the graph is Response ID
and receipt name that can show in below (Hastie, Friedman and Tisbshirani, 2017),