(Solved) Data Mining Process - PDF

Verified

Added on  2021/06/14

|18
|2339
|324
AI Summary

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Aim
The main goal of the Data mining process is to extract information from dataset and transform
into understandable structure. Generally the data mining techniques involves database
management aspects, data pre-processing, model and inferences considerations and interesting
metrics. Data mining is the analysis step of the “Knowledge discovery in database process”.
Introduction:
Data mining
Data mining is the process of discovering patterns in large datasets and used to extract usable
information from any raw data. It is an efficient technique to analyze and categorize the hidden
patterns of data according to various perspectives of applications. Data mining involves some
other methods to process the extracting data’s such as Data Cleaning, Data Integration, and
transformation of data, Evaluation of patterns and presentation of data. Once all these methods
are over, the extracting information’s are used in fraud detection, data analysis process and etc…
The Scope of Data Mining
Within the short time data mining optimize the huge dataset
It represents the data in different perspectives of logical order
It includes tree-shaped structure to understand the hierarchy of data
It is used to derive the genetic way of classification of various sets of data items.
Data mining has a number of functionalities belonging to two primary categories one is
descriptive and another is predictive
Descriptive
Descriptive is the clustering method which is used to identify the group of items based on some
similar characteristics
Predictive
Predictive is the classification technique which is used to predict the class attributes and base
models and rules.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1. Data mining tools
The design and development of several applications of data mining algorithms requires the use of
powerful tools. Different types of data mining tools are used to design the application program
for software and hardware platforms. The data can be found through various digital tools from
different sources to get raw data from digital and physical world
R-language tool
Rapid Miner (erstwhile YALE)
WEKA
Python based Orange and NTLK
Knime
Sisense
DataMart
Oracle Data Mining
Apache mahout
SSDT(SQL server data tools)
Rattle
IBM cognos
Teradata
Dundas BI
2. Data mining Techniques
Many techniques are used to mine data from different platforms and various applications. There
number of techniques is used to evolve the data sets in various environments.
Data mining techniques is the important factors for developing projects which are designed to
explore data. Data mining techniques has to be choosen based on the type of design and
development.
Most commonly used techniques in Data mining:
Statistics
Classification
Association
Outlier detection
Document Page
Clustering
Regression
Prediction
Cluster analysis
Anomaly detection
Intrusion detection
Decision trees
Neural networks
3. Benefits of Data mining
The main benefit of data mining process is to discover those records of information and
summarize it in a simpler format for the purpose of others .Data mining plays a vital role in
collecting, processing, storing and analyzing data in order to extract raw information from
various platforms. Data mining is used to create accurate models for databases. It helps to
identify the data patterns and used to discover all sorts of information. It is used to
improve the efficiency of decision making process
4. Cutting edge data mining techniques
It is one of the most popular techniques used in data mining. There are several major data mining
techniques have been developing. There are used in data mining projects. Recently adding the
association, decision tree, classification, sequential patterns, prediction and clustering etc. The
techniques are refers to technological devices. It is also known as leading edge technology or
state of the art technology.
The technology refers to the point at which there is a gap in knowledge.
Bleed edge technology
It is a high risk technology of being unreliable. Example for electronic mail(email).The
technology contains degree of risk.
Lack of concurrence
Document Page
Leading to rapid changes. But it is very nature. The way of creating new things
exists in the technology.
Lack of testing
It is one of the unreliable or simply untested technologies.
It is one of the successful technologies. It is used to establish the comparative advantages. The
bleeding edge computer software is open source software.
Another one technology of cutting edge is state of the art technology. It is sometimes
called also cutting edge. It is highest level of general development. It is a scientific field achieved
at a particular time.
5. Real time examples for cutting edge technology
NFC technology
The technology used in the Google billboard. Near field communication used in order to
encourage the customers. It is engaged with digital billboards.
Geo fencing
Geo fencing is a bar gaining area for marketers. Providing a host for mingled with the
consumers. It is a real time content in a specific location.
Face book hangers
It is one of the social networking applications. Avoid hacking in the process of
transformed messages.
6. Applications of Data mining
The following domains are mostly used the Data mining.
1. Risk management and corporate analysis.
2. Fraud detection.
3. Market Analysis and Management.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Risk management and corporate analysis
Listed below are the fields of corporate Sector are used the data mining:
Competition
Asset Evaluation and Finance planning
Resource planning
Fraud detection
It helps to find the duration of the call, Destination of the call and time of the day in the
fraud telephone calls.
Market Analysis and Management
Data mining is used in the following fields of market:
Target marketing
Customer profiling
Providing summary information
Cross market analysis
Determining customer purchasing pattern
Identifying customer requirements
Weka Analysis:
Weka plays the important role in Data mining. Data Mining is the technique that is used to
extract the information from the large amount of dataset. It is used in many real time applications
called Fraud detection, Production control, Market analysis, Customer Retention. It can be
discovered in New Zealand by University of Waikato. It is used for implementing the multiple
data mining algorithms. Here these algorithms are directly applied to the dataset. Weka is used
for performing multiple data mining tasks with the collection of machine learning algorithms.
The data mining algorithms will be performed on the following techniques:
Classification
Association rules
Preprocessing
Document Page
Clustering
Regression
Visualization tools also present in this Weka tool. It is open source software. This software is
issued by the General Public License (GNU).
Decision Tree Algorithm:
It is one of the best technique in data mining and it provide feasible result to the given dataset. It
contains a root node which is placed at the top of the tree and other nodes followed by the root is
known as child nodes, and more than one node which forms a branches, Every internal node
contains an attribute, every branch represent an outcome of a test, and all leaf node has a class
label.
Algorithm:
Data partition, D, which is a set of training tuples and their associated class labels.
Attribute list need to specify the given nodes.
And Splitting criterion used to split the given data d and convert it into a tree.
The final output will be a decision tree.
Method:
Create a node a;
If tuples in D are in the same class, C then return A as leaf node labeled with class c;
If attribute_list is empty then return N as leaf node with labeled with majority class in D;
Apply attribute_selection_method(a, attribute_list) to find the best splitting_criterion;
Label node N with splitting_criterion;
For each outcome j of splitting criterion
If Dj is empty then
Attach a leaf labeled with the majority
Class in D to node N;
Else attach the node returned by generate
Decision tree(Dj, attribute_list) to node a;
End for;
Return a;
K nearest Neighbor
Document Page
It is an algorithm used for classification and regression, it is a non-parametric method. The
output of this mining technique is depend on the classification or regression, that is
In k-NN classification, the object separated by the most vote of its neighbors and the
output will be a class membership.
In K-NN regression, average value is taken from the given value and the output will be
property value of the given object.
Algorithm:
A given case will be classified by the votes of its neighbor which node hold highest
value, if k=1, then case is assigned to the class of its neighbor.
Other nodes which contains the reasonable values are assigned next to nearest node in the
same class.
The distance between the nodes is also calculated for perfect voting and placing the nodes
in the correct order.
Naive Bayes classifier:
It is classifier mostly used in the machine learning system and provides the result based on
utilization of the Bayes’s theorem. It is a probabilistic classifier, provide high scalability. It uses
a prediction methods to classify the given data and apply the baye’s theorem to provide the
feasible result. In Machine learning system naive Bayesian technique play a vital role to get the
perfect learning system by every prediction of datasets and get the most predictable set as a
perfect result.
All the system use the Bayesian equation to get most predictable result. The bayes theorem
Algorithm:
Convert the data set into a frequency table
Create a simple table by analyzing the probabilities
Finally use the naive Bayesian to get the result for every case. And the highest probability
is the outcome of prediction.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Step 2:
Performance of the Three Algorithm and its Types:
Decision Tree.
K-nearest neighbour Algorithm.
Navie Bayes Algorithm.
Decision Tree Algorithm Performance:
Decision tree is important algorithm for the data mining. It is the easiest one for
comparing another algorithm. Decision tree is the Supervised learning algorithm .It is easily
understand and very useful to use .The important one of the decision tree are following below:
Training the data.
Predictive the Model.
Combine the form of tree structure.
Decision tree is the Classification Algorithm. The main aim of the Algorithm to classify the
lowest number of the tree Structure. In our project using the Soybean.arff datasets. If the dataset
are analyzing the weka tool are given below:
Choose WEKA Tool:
Document Page
In this Weka tool are widely used to analyze the different types of the Classification Algorithm.
It contains the Process to choose to select the type of the Algorithm. Finally we predict the
classify type if the data.
Select Explorer in WEKA Tool:
After selecting the explorer, choose the process and select the data set from the WEKA tool
resources. The dataset contains the number of the instance and labels are there.
How to Choose Dataset from WEKA tool:
Document Page
The WEKA tool contains the largest number of the datasets. The above figure, using the
soybean.arff dataset. Import the dataset to analyzing the data using the Classification Algorithm.
Soybean datasets visualizing the WEKA Tool:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
In the Above figure this is the overall classification of the data set for the Soybean.arff .In this fig
discuss the number of instance and number labels are present. In this WEKA Tool are briefly
explain the number of instance and number of attributes. WEKA tool are analyzing the large
number of the Classification algorithm are used.
Decision tree algorithm performance (Soybean.arff) data sets:
In the Above fig using the decision tree Algorithm. The Decision tree is the important one for the
other Algorithm.J48 is the part the decision tree algorithm.
Document Page
Cross-validation Summary of soybean.arff dataset using Decision Tree:
In the above figure used to classify the number of the instance and the number of the attributes
are explained detailed. J48 algorithm used to split the data easily. It is easily understandable
algorithm. In WEKA tool , the test option are used the datasets generated the new formation of
the data set using Cross validation.
Document Page
Confusion Matrix of Soybean.arff dataset using Decision Tree:
Visualization Curve of Soybean dataset using Decision tree:
Decision tree Structure of Soybean.arff dataset:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Navie Bayes Algorithm Performance of Soybean dataset:
Navie Bayes Algorithm:
Document Page
Navie byes is the one of the Classification Algorithm. This algorithm is based on the
Approach of Bayes theorem. It is used to predict the models and Class labels. This classifier
based on the Probability theorem.
Confusion Matrix of Soybean dataset using Navie Bayes Algorithm:
Visualization Curve of Soybean dataset using Navie Bayes Algorithm:
Document Page
K Nearest Neighbor Algorithm:
Choosing the KNN Algorithm:
Confusion Matrix of Soybean datasets using KNN Algorithm:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The KNN Algorithm is widely used for the Past data, with the Corrected output values. In this
Algorithm used to predict the Unknown data.
Visualization Curve of Soybean dataset using KNN Algorithm:
Conclusion:
Document Page
For the applied dataset, in the naive Bayes, K nearest neighbor and decision tree, the
outcome of the decision tree is considerably provide expected and best result for the data set, In
the result analysis the best case was found by using tree structure and visualization curve. And
also it is easily understandable and simple to use.
1 out of 18
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]