(Solved) Data Mining Process - PDF

Verified

Added on 2021/06/14

AI Summary

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Aim
The main goal of the Data mining process is to extract information from dataset and transform
into understandable structure. Generally the data mining techniques involves database
management aspects, data pre-processing, model and inferences considerations and interesting
metrics. Data mining is the analysis step of the “Knowledge discovery in database process”.
Introduction:
Data mining
Data mining is the process of discovering patterns in large datasets and used to extract usable
information from any raw data. It is an efficient technique to analyze and categorize the hidden
patterns of data according to various perspectives of applications. Data mining involves some
other methods to process the extracting data’s such as Data Cleaning, Data Integration, and
transformation of data, Evaluation of patterns and presentation of data. Once all these methods
are over, the extracting information’s are used in fraud detection, data analysis process and etc…
The Scope of Data Mining
 Within the short time data mining optimize the huge dataset
 It represents the data in different perspectives of logical order
 It includes tree-shaped structure to understand the hierarchy of data
 It is used to derive the genetic way of classification of various sets of data items.
Data mining has a number of functionalities belonging to two primary categories one is
descriptive and another is predictive
Descriptive
Descriptive is the clustering method which is used to identify the group of items based on some
similar characteristics
Predictive
Predictive is the classification technique which is used to predict the class attributes and base
models and rules.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

1. Data mining tools
The design and development of several applications of data mining algorithms requires the use of
powerful tools. Different types of data mining tools are used to design the application program
for software and hardware platforms. The data can be found through various digital tools from
different sources to get raw data from digital and physical world
 R-language tool
 Rapid Miner (erstwhile YALE)
 WEKA
 Python based Orange and NTLK
 Knime
 Sisense
 DataMart
 Oracle Data Mining
 Apache mahout
 SSDT(SQL server data tools)
 Rattle
 IBM cognos
 Teradata
 Dundas BI
2. Data mining Techniques
Many techniques are used to mine data from different platforms and various applications. There
number of techniques is used to evolve the data sets in various environments.
Data mining techniques is the important factors for developing projects which are designed to
explore data. Data mining techniques has to be choosen based on the type of design and
development.
Most commonly used techniques in Data mining:
 Statistics
 Classification
 Association
 Outlier detection

 Clustering
 Regression
 Prediction
 Cluster analysis
 Anomaly detection
 Intrusion detection
 Decision trees
 Neural networks
3. Benefits of Data mining
The main benefit of data mining process is to discover those records of information and
summarize it in a simpler format for the purpose of others .Data mining plays a vital role in
collecting, processing, storing and analyzing data in order to extract raw information from
various platforms. Data mining is used to create accurate models for databases. It helps to
identify the data patterns and used to discover all sorts of information. It is used to
improve the efficiency of decision making process
4. Cutting edge data mining techniques
It is one of the most popular techniques used in data mining. There are several major data mining
techniques have been developing. There are used in data mining projects. Recently adding the
association, decision tree, classification, sequential patterns, prediction and clustering etc. The
techniques are refers to technological devices. It is also known as leading edge technology or
state of the art technology.
The technology refers to the point at which there is a gap in knowledge.
Bleed edge technology
It is a high risk technology of being unreliable. Example for electronic mail(email).The
technology contains degree of risk.
Lack of concurrence

Leading to rapid changes. But it is very nature. The way of creating new things
exists in the technology.
Lack of testing
It is one of the unreliable or simply untested technologies.
It is one of the successful technologies. It is used to establish the comparative advantages. The
bleeding edge computer software is open source software.
Another one technology of cutting edge is state of the art technology. It is sometimes
called also cutting edge. It is highest level of general development. It is a scientific field achieved
at a particular time.
5. Real time examples for cutting edge technology
NFC technology
The technology used in the Google billboard. Near field communication used in order to
encourage the customers. It is engaged with digital billboards.
Geo fencing
Geo fencing is a bar gaining area for marketers. Providing a host for mingled with the
consumers. It is a real time content in a specific location.
Face book hangers
It is one of the social networking applications. Avoid hacking in the process of
transformed messages.
6. Applications of Data mining
The following domains are mostly used the Data mining.
1. Risk management and corporate analysis.
2. Fraud detection.
3. Market Analysis and Management.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Risk management and corporate analysis
Listed below are the fields of corporate Sector are used the data mining:
 Competition
 Asset Evaluation and Finance planning
 Resource planning
Fraud detection
It helps to find the duration of the call, Destination of the call and time of the day in the
fraud telephone calls.
Market Analysis and Management
Data mining is used in the following fields of market:
 Target marketing
 Customer profiling
 Providing summary information
 Cross market analysis
 Determining customer purchasing pattern
 Identifying customer requirements
Weka Analysis:
Weka plays the important role in Data mining. Data Mining is the technique that is used to
extract the information from the large amount of dataset. It is used in many real time applications
called Fraud detection, Production control, Market analysis, Customer Retention. It can be
discovered in New Zealand by University of Waikato. It is used for implementing the multiple
data mining algorithms. Here these algorithms are directly applied to the dataset. Weka is used
for performing multiple data mining tasks with the collection of machine learning algorithms.
The data mining algorithms will be performed on the following techniques:
 Classification
 Association rules
 Preprocessing

 Clustering
 Regression
Visualization tools also present in this Weka tool. It is open source software. This software is
issued by the General Public License (GNU).
Decision Tree Algorithm:
It is one of the best technique in data mining and it provide feasible result to the given dataset. It
contains a root node which is placed at the top of the tree and other nodes followed by the root is
known as child nodes, and more than one node which forms a branches, Every internal node
contains an attribute, every branch represent an outcome of a test, and all leaf node has a class
label.
Algorithm:
 Data partition, D, which is a set of training tuples and their associated class labels.
 Attribute list need to specify the given nodes.
 And Splitting criterion used to split the given data d and convert it into a tree.
 The final output will be a decision tree.
Method:
Create a node a;
 If tuples in D are in the same class, C then return A as leaf node labeled with class c;
 If attribute_list is empty then return N as leaf node with labeled with majority class in D;
 Apply attribute_selection_method(a, attribute_list) to find the best splitting_criterion;
 Label node N with splitting_criterion;
 For each outcome j of splitting criterion
 If Dj is empty then
Attach a leaf labeled with the majority
Class in D to node N;
 Else attach the node returned by generate
 Decision tree(Dj, attribute_list) to node a;
 End for;
 Return a;
K nearest Neighbor

It is an algorithm used for classification and regression, it is a non-parametric method. The
output of this mining technique is depend on the classification or regression, that is
 In k-NN classification, the object separated by the most vote of its neighbors and the
output will be a class membership.
 In K-NN regression, average value is taken from the given value and the output will be
property value of the given object.
Algorithm:
 A given case will be classified by the votes of its neighbor which node hold highest
value, if k=1, then case is assigned to the class of its neighbor.
 Other nodes which contains the reasonable values are assigned next to nearest node in the
same class.
 The distance between the nodes is also calculated for perfect voting and placing the nodes
in the correct order.
Naive Bayes classifier:
It is classifier mostly used in the machine learning system and provides the result based on
utilization of the Bayes’s theorem. It is a probabilistic classifier, provide high scalability. It uses
a prediction methods to classify the given data and apply the baye’s theorem to provide the
feasible result. In Machine learning system naive Bayesian technique play a vital role to get the
perfect learning system by every prediction of datasets and get the most predictable set as a
perfect result.
All the system use the Bayesian equation to get most predictable result. The bayes theorem
Algorithm:
 Convert the data set into a frequency table
 Create a simple table by analyzing the probabilities
 Finally use the naive Bayesian to get the result for every case. And the highest probability
is the outcome of prediction.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Step 2:
Performance of the Three Algorithm and its Types:
 Decision Tree.
 K-nearest neighbour Algorithm.
 Navie Bayes Algorithm.
Decision Tree Algorithm Performance:
Decision tree is important algorithm for the data mining. It is the easiest one for
comparing another algorithm. Decision tree is the Supervised learning algorithm .It is easily
understand and very useful to use .The important one of the decision tree are following below:
 Training the data.
 Predictive the Model.
 Combine the form of tree structure.
Decision tree is the Classification Algorithm. The main aim of the Algorithm to classify the
lowest number of the tree Structure. In our project using the Soybean.arff datasets. If the dataset
are analyzing the weka tool are given below:
Choose WEKA Tool:

In this Weka tool are widely used to analyze the different types of the Classification Algorithm.
It contains the Process to choose to select the type of the Algorithm. Finally we predict the
classify type if the data.
Select Explorer in WEKA Tool:
After selecting the explorer, choose the process and select the data set from the WEKA tool
resources. The dataset contains the number of the instance and labels are there.
How to Choose Dataset from WEKA tool:

The WEKA tool contains the largest number of the datasets. The above figure, using the
soybean.arff dataset. Import the dataset to analyzing the data using the Classification Algorithm.
Soybean datasets visualizing the WEKA Tool:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

In the Above figure this is the overall classification of the data set for the Soybean.arff .In this fig
discuss the number of instance and number labels are present. In this WEKA Tool are briefly
explain the number of instance and number of attributes. WEKA tool are analyzing the large
number of the Classification algorithm are used.
Decision tree algorithm performance (Soybean.arff) data sets:
In the Above fig using the decision tree Algorithm. The Decision tree is the important one for the
other Algorithm.J48 is the part the decision tree algorithm.

Cross-validation Summary of soybean.arff dataset using Decision Tree:
In the above figure used to classify the number of the instance and the number of the attributes
are explained detailed. J48 algorithm used to split the data easily. It is easily understandable
algorithm. In WEKA tool , the test option are used the datasets generated the new formation of
the data set using Cross validation.

Confusion Matrix of Soybean.arff dataset using Decision Tree:
Visualization Curve of Soybean dataset using Decision tree:
Decision tree Structure of Soybean.arff dataset:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Navie Bayes Algorithm Performance of Soybean dataset:
Navie Bayes Algorithm:

Navie byes is the one of the Classification Algorithm. This algorithm is based on the
Approach of Bayes theorem. It is used to predict the models and Class labels. This classifier
based on the Probability theorem.
Confusion Matrix of Soybean dataset using Navie Bayes Algorithm:
Visualization Curve of Soybean dataset using Navie Bayes Algorithm:

K Nearest Neighbor Algorithm:
Choosing the KNN Algorithm:
Confusion Matrix of Soybean datasets using KNN Algorithm:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The KNN Algorithm is widely used for the Past data, with the Corrected output values. In this
Algorithm used to predict the Unknown data.
Visualization Curve of Soybean dataset using KNN Algorithm:
Conclusion:

For the applied dataset, in the naive Bayes, K nearest neighbor and decision tree, the
outcome of the decision tree is considerably provide expected and best result for the data set, In
the result analysis the best case was found by using tree structure and visualization curve. And
also it is easily understandable and simple to use.

1 out of 18

(Solved) Data Mining Process - PDF

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Related Documents

Data Mining Techniques for Prediction of Employees' Performance

Data Mining - Practical Machine Learning Tools and Techniques

Data Mining - Definition, Stages, Advantages and Drawbacks

Data Mining Applications with RapidMiner

Data Mining and Predictive Analysis

Data Mining with WEKA: Student Profiling

+13062052269

info@desklib.com