Advanced Databases and Applications CP5520: Clustering in Data Mining

Verified

Added on 2023/06/07

AI Summary

This research paper provides a comprehensive overview of clustering in data mining. It begins by defining clustering and its role in data mining, emphasizing its application in various fields such as image processing and market research. The paper then delves into the core concepts of data mining, detailing its procedure and the iterative process of knowledge discovery. A significant portion of the paper is dedicated to exploring the diverse types of clustering techniques, including partition, hierarchical, exclusive, and overlapping clustering, with a focus on k-means and CLARA algorithms. Through a detailed literature review, the paper analyzes the significance of clustering, its advantages, and its relevance in extracting valuable insights from large datasets. The research questions address the definition of clustering, types of clustering, data mining, and the rationale behind using clustering in data mining. The paper concludes by summarizing the key findings and emphasizing the importance of clustering as a fundamental technique in data analysis.

Running head: CLUSTERING IN DATA MINING
ADVANCED DATABASE AND APPLICATIONS: CLUSTERING IN DATA
MINING
Prepared by
(Student’s Name)
Prepared for
(Professor’s Name)
(Course Title)
(Date of Submission)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 2
Table of Contents
Abstract............................................................................................................................................3
List of figures...................................................................................................................................4
Chapter one......................................................................................................................................5
Introduction..................................................................................................................................5
Statement of the problem.............................................................................................................5
Research questions.......................................................................................................................6
Chapter Two....................................................................................................................................7
Literature Review.........................................................................................................................7
An overview of data mining.....................................................................................................7
Data mining procedure.............................................................................................................8
Types of clustering in data mining.........................................................................................11
Chapter three..................................................................................................................................15
Discussion and Analysis............................................................................................................15
Why clustering in data mining...................................................................................................17
Conclusion.....................................................................................................................................18
References......................................................................................................................................19

CLUSTERING IN DATA MINING 3
Abstract
This research paper aims at discussing the idea of clustering in data mining. In this
research paper will first give an overview of data mining. Further the paper will cover types of
clustering in data mining. Generally as it will viewed in the paper, clustering is a group of data
objects. A cluster in data mining is treated as one group with similar characteristics. Clustering
in its application is used in image processing, pattern recognition, market research, and data
analysis. This paper will be divided into three major sections. The first section is the
introduction, in this part, the paper will highlight the three research questions and problem
statement. The second part is the literature review. This part will review data mining, data
mining procedure, an overview of clustering, types of clustering in data mining. In chapter three
will be discussion and analysis. The discussion will be based on literature review. The last part is
the conclusion.

CLUSTERING IN DATA MINING 4
List of figures
Figure 1: Data mining......................................................................................................................4
Figure 2: Data mining process ........................................................................................................5
Figure 3: Examples of clustering.....................................................................................................6
Figure 4: Stages of clustering..........................................................................................................9

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 5
Chapter one
Introduction
Clustering is defined as a group of objects which belong to the same class. This means
that alike objects are usually assembled together in one cluster. Another name of clustering is
cluster analysis. Beside these two terms, there are other terms related to clustering which are
typological analysis, automatic classification, numerical taxonomy and bryology (Bramer, 2017).
Data mining is said to be the process of sorting large data sets so as to recognize patterns and to
launch the relationships in a certain data set via data analysis; it is the procedure of mining or
extracting data relationship from a large amount of data. Other defined data mining as a method
used by organizations to crack fresh data into valuable information. Organizations use software
to find for patterns in their database. Data mining is adopted by organizations so as to learn more
about the organizational customers (Azzalini & Scarpa, 2012).
In an analysis that aims at learning more about the relationship that exists in a certain
database needs the understanding of the two main concepts which are clustering and data mining.
This research paper will focus on clustering in data mining. To do this, the research paper will
focus in the various types of clustering in data mining (Gan, 2016).
Statement of the problem
Clustering is the most popular concept in data mining. For one extract knowledge in a
certain database needs to fist find the similarities between data which one of the main mandate of
cluster analysis. Cluster analysis first find the similarity in a database and the groups’ similar

CLUSTERING IN DATA MINING 6
data objects into clusters. This means that to perform data mining cluster analysis needs to be
done first.
Research questions
This minor research will have three major research questions;
1. What is clustering and which are the types of clustering?]
2. What is data mining?
3. Why clustering in data mining?
These three research questions will help in uncovering clustering in data mining. In addition it
will lead in identifying some of the reasons as to why clustering in data mining is the most
common and popular technique in data mining

CLUSTERING IN DATA MINING 7
Chapter Two
Literature Review
An overview of data mining
To start with data mining is widely identified as knowledge discovery. It is the exercise
of both traditional and automated data scrutiny techniques so as to discover formerly hidden
relationships between data items. It also involves the process of analysis of data stored in a
certain data warehouse. One can also define data mining as a non-trivial mining of potentially
and implicit useful data or information in a database (Giusti, Ritter, & Vichi, 2014). Figure one
shows the process of data mining as iterative knowledge detection process.
Figure 1: Data mining (Hemlata Shau, n.d)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 8
As viewed from figure one; it is evident that data mining comprised six major interactive
processes which are data selection, data cleaning, knowledge representation ,data integration,
pattern evaluation, and data transformation. Data scrubbing or cleaning process can also be
referred to as data cleansing; it is a segment where immaterial data or noise data is detached from
the collection. Data integration is where numerous data sources are combined to joint source
(Abbass, Sarker, & Newton, 2010). Pattern evaluation is the process by which interesting
patterns that represents knowledge are identified which must be based on a certain measure. The
final phase is the knowledge representation where the useful information is presented to the user.
This is where visualization techniques to assist the users interpret and understand data mining
results are presented (Klösgen, 2002) .
Data mining includes four main classes of tasks which are classification, clustering
association rule learning, and regression. Clustering which is later discussed in this chapter is
main class in data mining. Other classes used results gathered in clustering to get finer details of
a useful information. Example classification is a task of just simplifying a recognized structure
so as to apply new data. Example an email program such as yahoo can classify emails such as
inbox or sent, spam or legitimate. Regression on the other hand just attempt only to fund a task
which mockups the data with the least error (Olson, 2015).
Data mining procedure
Data mining procedure or process is composed of information expression, data mining,
data preparation analysis and decision-making. Figure 2 below shows a general process of data
mining.

CLUSTERING IN DATA MINING 9
Figure 2: Data mining process (Tan H. , 2012)
From figure two data preparation is a process which contains of dual major procedures
which are data collation and data collection. As one can see from the figure data collection is the
initial step of data mining process. One of the main duty of data collation is to eliminate noise in
the data. In addition, this step is used in eliminating inconsistent data. Data mining step is the
core stage of the overall process of data mining. At this stage is where the four tasks of data
mining are carried out. From the figure there three major steps which data are; mining method
collection, data mining algorithm and data mining (Maloof, 2006). Information expression is the

CLUSTERING IN DATA MINING 10
second last step where knowledge information expression technology is used so as to mine
knowledge information for the users. Analysis and decision-making, the last step is used to
analyze results of the whole process.
Clustering
Clustering, is part of the four classes of data mining is the procedure of determining
structures and groups in a certain database. It is used to place elements of data into various
related groups. There various techniques of clustering which will discussed later in the chapter
which are maximization (EM) clustering and k-means. One of main objective of clustering is to
group objects in a similar which is different from other groups. Grouping in clustering is done
according to customer preference or logical relationship (Aggarwal, 2016). An example of
clustering is shown by the diagram below
Figure 3: Examples of clustering (Archana, 2015)
There four major types of clustering which are exclusive, overlapping, and hierarchical.
Exclusive type of cluster analysis is where objects are grouped in an exclusive way. This is done

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 11
so as to find out if there is a assured datum which belongs to a definite cluster. Overlapping type
of clustering uses fuzzy sets so as to classify data (Han, Kamber, & Jian Pei, 2013).
One of the major objective of clustering is decide some of the intrinsic groping is a set of
unrelated data or unlabeled data. The major necessities of clustering in data mining are
scalability, interpretability, high dimensionality, insensitivity, dealing with various types of
attributes, ability to discover clusters with an arbitrary shape, and minimal requirements for
domain knowledge to determine input parameters (Kantardzic, 2014).
Types of clustering in data mining
There are various types of clustering in data mining which are partition clustering,
hierarchical, exclusive, overlapping, and complete. Hierarchical type of cluster analysis is also
known as nesting type of clustering or hierarchical cluster analysis. This is an algorithm that
groups similar objects which are referred to as clusters (Perner, Advances in Data Mining.,
2013). A hierarchical type of clustering starts by first treating each and every observation as a
separate cluster. It then repeatedly executes two steps. The first step is identifying when two
clusters which are closest together. The second step is merging the two most similar steps. These
two steps continues until all the clusters are merged together. Figure 4 below illustrates
hierarchical type of structure