Advanced Databases and Applications CP5520: Clustering in Data Mining
VerifiedAdded on 2023/06/07
|21
|4137
|413
Report
AI Summary
This research paper provides a comprehensive overview of clustering in data mining. It begins by defining clustering and its role in data mining, emphasizing its application in various fields such as image processing and market research. The paper then delves into the core concepts of data mining, detailing its procedure and the iterative process of knowledge discovery. A significant portion of the paper is dedicated to exploring the diverse types of clustering techniques, including partition, hierarchical, exclusive, and overlapping clustering, with a focus on k-means and CLARA algorithms. Through a detailed literature review, the paper analyzes the significance of clustering, its advantages, and its relevance in extracting valuable insights from large datasets. The research questions address the definition of clustering, types of clustering, data mining, and the rationale behind using clustering in data mining. The paper concludes by summarizing the key findings and emphasizing the importance of clustering as a fundamental technique in data analysis.

Running head: CLUSTERING IN DATA MINING
ADVANCED DATABASE AND APPLICATIONS: CLUSTERING IN DATA
MINING
Prepared by
(Student’s Name)
Prepared for
(Professor’s Name)
(Course Title)
(Date of Submission)
ADVANCED DATABASE AND APPLICATIONS: CLUSTERING IN DATA
MINING
Prepared by
(Student’s Name)
Prepared for
(Professor’s Name)
(Course Title)
(Date of Submission)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

CLUSTERING IN DATA MINING 2
Table of Contents
Abstract............................................................................................................................................3
List of figures...................................................................................................................................4
Chapter one......................................................................................................................................5
Introduction..................................................................................................................................5
Statement of the problem.............................................................................................................5
Research questions.......................................................................................................................6
Chapter Two....................................................................................................................................7
Literature Review.........................................................................................................................7
An overview of data mining.....................................................................................................7
Data mining procedure.............................................................................................................8
Types of clustering in data mining.........................................................................................11
Chapter three..................................................................................................................................15
Discussion and Analysis............................................................................................................15
Why clustering in data mining...................................................................................................17
Conclusion.....................................................................................................................................18
References......................................................................................................................................19
Table of Contents
Abstract............................................................................................................................................3
List of figures...................................................................................................................................4
Chapter one......................................................................................................................................5
Introduction..................................................................................................................................5
Statement of the problem.............................................................................................................5
Research questions.......................................................................................................................6
Chapter Two....................................................................................................................................7
Literature Review.........................................................................................................................7
An overview of data mining.....................................................................................................7
Data mining procedure.............................................................................................................8
Types of clustering in data mining.........................................................................................11
Chapter three..................................................................................................................................15
Discussion and Analysis............................................................................................................15
Why clustering in data mining...................................................................................................17
Conclusion.....................................................................................................................................18
References......................................................................................................................................19

CLUSTERING IN DATA MINING 3
Abstract
This research paper aims at discussing the idea of clustering in data mining. In this
research paper will first give an overview of data mining. Further the paper will cover types of
clustering in data mining. Generally as it will viewed in the paper, clustering is a group of data
objects. A cluster in data mining is treated as one group with similar characteristics. Clustering
in its application is used in image processing, pattern recognition, market research, and data
analysis. This paper will be divided into three major sections. The first section is the
introduction, in this part, the paper will highlight the three research questions and problem
statement. The second part is the literature review. This part will review data mining, data
mining procedure, an overview of clustering, types of clustering in data mining. In chapter three
will be discussion and analysis. The discussion will be based on literature review. The last part is
the conclusion.
Abstract
This research paper aims at discussing the idea of clustering in data mining. In this
research paper will first give an overview of data mining. Further the paper will cover types of
clustering in data mining. Generally as it will viewed in the paper, clustering is a group of data
objects. A cluster in data mining is treated as one group with similar characteristics. Clustering
in its application is used in image processing, pattern recognition, market research, and data
analysis. This paper will be divided into three major sections. The first section is the
introduction, in this part, the paper will highlight the three research questions and problem
statement. The second part is the literature review. This part will review data mining, data
mining procedure, an overview of clustering, types of clustering in data mining. In chapter three
will be discussion and analysis. The discussion will be based on literature review. The last part is
the conclusion.

CLUSTERING IN DATA MINING 4
List of figures
Figure 1: Data mining......................................................................................................................4
Figure 2: Data mining process ........................................................................................................5
Figure 3: Examples of clustering.....................................................................................................6
Figure 4: Stages of clustering..........................................................................................................9
List of figures
Figure 1: Data mining......................................................................................................................4
Figure 2: Data mining process ........................................................................................................5
Figure 3: Examples of clustering.....................................................................................................6
Figure 4: Stages of clustering..........................................................................................................9
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

CLUSTERING IN DATA MINING 5
Chapter one
Introduction
Clustering is defined as a group of objects which belong to the same class. This means
that alike objects are usually assembled together in one cluster. Another name of clustering is
cluster analysis. Beside these two terms, there are other terms related to clustering which are
typological analysis, automatic classification, numerical taxonomy and bryology (Bramer, 2017).
Data mining is said to be the process of sorting large data sets so as to recognize patterns and to
launch the relationships in a certain data set via data analysis; it is the procedure of mining or
extracting data relationship from a large amount of data. Other defined data mining as a method
used by organizations to crack fresh data into valuable information. Organizations use software
to find for patterns in their database. Data mining is adopted by organizations so as to learn more
about the organizational customers (Azzalini & Scarpa, 2012).
In an analysis that aims at learning more about the relationship that exists in a certain
database needs the understanding of the two main concepts which are clustering and data mining.
This research paper will focus on clustering in data mining. To do this, the research paper will
focus in the various types of clustering in data mining (Gan, 2016).
Statement of the problem
Clustering is the most popular concept in data mining. For one extract knowledge in a
certain database needs to fist find the similarities between data which one of the main mandate of
cluster analysis. Cluster analysis first find the similarity in a database and the groups’ similar
Chapter one
Introduction
Clustering is defined as a group of objects which belong to the same class. This means
that alike objects are usually assembled together in one cluster. Another name of clustering is
cluster analysis. Beside these two terms, there are other terms related to clustering which are
typological analysis, automatic classification, numerical taxonomy and bryology (Bramer, 2017).
Data mining is said to be the process of sorting large data sets so as to recognize patterns and to
launch the relationships in a certain data set via data analysis; it is the procedure of mining or
extracting data relationship from a large amount of data. Other defined data mining as a method
used by organizations to crack fresh data into valuable information. Organizations use software
to find for patterns in their database. Data mining is adopted by organizations so as to learn more
about the organizational customers (Azzalini & Scarpa, 2012).
In an analysis that aims at learning more about the relationship that exists in a certain
database needs the understanding of the two main concepts which are clustering and data mining.
This research paper will focus on clustering in data mining. To do this, the research paper will
focus in the various types of clustering in data mining (Gan, 2016).
Statement of the problem
Clustering is the most popular concept in data mining. For one extract knowledge in a
certain database needs to fist find the similarities between data which one of the main mandate of
cluster analysis. Cluster analysis first find the similarity in a database and the groups’ similar

CLUSTERING IN DATA MINING 6
data objects into clusters. This means that to perform data mining cluster analysis needs to be
done first.
Research questions
This minor research will have three major research questions;
1. What is clustering and which are the types of clustering?]
2. What is data mining?
3. Why clustering in data mining?
These three research questions will help in uncovering clustering in data mining. In addition it
will lead in identifying some of the reasons as to why clustering in data mining is the most
common and popular technique in data mining
data objects into clusters. This means that to perform data mining cluster analysis needs to be
done first.
Research questions
This minor research will have three major research questions;
1. What is clustering and which are the types of clustering?]
2. What is data mining?
3. Why clustering in data mining?
These three research questions will help in uncovering clustering in data mining. In addition it
will lead in identifying some of the reasons as to why clustering in data mining is the most
common and popular technique in data mining

CLUSTERING IN DATA MINING 7
Chapter Two
Literature Review
An overview of data mining
To start with data mining is widely identified as knowledge discovery. It is the exercise
of both traditional and automated data scrutiny techniques so as to discover formerly hidden
relationships between data items. It also involves the process of analysis of data stored in a
certain data warehouse. One can also define data mining as a non-trivial mining of potentially
and implicit useful data or information in a database (Giusti, Ritter, & Vichi, 2014). Figure one
shows the process of data mining as iterative knowledge detection process.
Figure 1: Data mining (Hemlata Shau, n.d)
Chapter Two
Literature Review
An overview of data mining
To start with data mining is widely identified as knowledge discovery. It is the exercise
of both traditional and automated data scrutiny techniques so as to discover formerly hidden
relationships between data items. It also involves the process of analysis of data stored in a
certain data warehouse. One can also define data mining as a non-trivial mining of potentially
and implicit useful data or information in a database (Giusti, Ritter, & Vichi, 2014). Figure one
shows the process of data mining as iterative knowledge detection process.
Figure 1: Data mining (Hemlata Shau, n.d)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 8
As viewed from figure one; it is evident that data mining comprised six major interactive
processes which are data selection, data cleaning, knowledge representation ,data integration,
pattern evaluation, and data transformation. Data scrubbing or cleaning process can also be
referred to as data cleansing; it is a segment where immaterial data or noise data is detached from
the collection. Data integration is where numerous data sources are combined to joint source
(Abbass, Sarker, & Newton, 2010). Pattern evaluation is the process by which interesting
patterns that represents knowledge are identified which must be based on a certain measure. The
final phase is the knowledge representation where the useful information is presented to the user.
This is where visualization techniques to assist the users interpret and understand data mining
results are presented (Klösgen, 2002) .
Data mining includes four main classes of tasks which are classification, clustering
association rule learning, and regression. Clustering which is later discussed in this chapter is
main class in data mining. Other classes used results gathered in clustering to get finer details of
a useful information. Example classification is a task of just simplifying a recognized structure
so as to apply new data. Example an email program such as yahoo can classify emails such as
inbox or sent, spam or legitimate. Regression on the other hand just attempt only to fund a task
which mockups the data with the least error (Olson, 2015).
Data mining procedure
Data mining procedure or process is composed of information expression, data mining,
data preparation analysis and decision-making. Figure 2 below shows a general process of data
mining.
As viewed from figure one; it is evident that data mining comprised six major interactive
processes which are data selection, data cleaning, knowledge representation ,data integration,
pattern evaluation, and data transformation. Data scrubbing or cleaning process can also be
referred to as data cleansing; it is a segment where immaterial data or noise data is detached from
the collection. Data integration is where numerous data sources are combined to joint source
(Abbass, Sarker, & Newton, 2010). Pattern evaluation is the process by which interesting
patterns that represents knowledge are identified which must be based on a certain measure. The
final phase is the knowledge representation where the useful information is presented to the user.
This is where visualization techniques to assist the users interpret and understand data mining
results are presented (Klösgen, 2002) .
Data mining includes four main classes of tasks which are classification, clustering
association rule learning, and regression. Clustering which is later discussed in this chapter is
main class in data mining. Other classes used results gathered in clustering to get finer details of
a useful information. Example classification is a task of just simplifying a recognized structure
so as to apply new data. Example an email program such as yahoo can classify emails such as
inbox or sent, spam or legitimate. Regression on the other hand just attempt only to fund a task
which mockups the data with the least error (Olson, 2015).
Data mining procedure
Data mining procedure or process is composed of information expression, data mining,
data preparation analysis and decision-making. Figure 2 below shows a general process of data
mining.

CLUSTERING IN DATA MINING 9
Figure 2: Data mining process (Tan H. , 2012)
From figure two data preparation is a process which contains of dual major procedures
which are data collation and data collection. As one can see from the figure data collection is the
initial step of data mining process. One of the main duty of data collation is to eliminate noise in
the data. In addition, this step is used in eliminating inconsistent data. Data mining step is the
core stage of the overall process of data mining. At this stage is where the four tasks of data
mining are carried out. From the figure there three major steps which data are; mining method
collection, data mining algorithm and data mining (Maloof, 2006). Information expression is the
Figure 2: Data mining process (Tan H. , 2012)
From figure two data preparation is a process which contains of dual major procedures
which are data collation and data collection. As one can see from the figure data collection is the
initial step of data mining process. One of the main duty of data collation is to eliminate noise in
the data. In addition, this step is used in eliminating inconsistent data. Data mining step is the
core stage of the overall process of data mining. At this stage is where the four tasks of data
mining are carried out. From the figure there three major steps which data are; mining method
collection, data mining algorithm and data mining (Maloof, 2006). Information expression is the

CLUSTERING IN DATA MINING 10
second last step where knowledge information expression technology is used so as to mine
knowledge information for the users. Analysis and decision-making, the last step is used to
analyze results of the whole process.
Clustering
Clustering, is part of the four classes of data mining is the procedure of determining
structures and groups in a certain database. It is used to place elements of data into various
related groups. There various techniques of clustering which will discussed later in the chapter
which are maximization (EM) clustering and k-means. One of main objective of clustering is to
group objects in a similar which is different from other groups. Grouping in clustering is done
according to customer preference or logical relationship (Aggarwal, 2016). An example of
clustering is shown by the diagram below
Figure 3: Examples of clustering (Archana, 2015)
There four major types of clustering which are exclusive, overlapping, and hierarchical.
Exclusive type of cluster analysis is where objects are grouped in an exclusive way. This is done
second last step where knowledge information expression technology is used so as to mine
knowledge information for the users. Analysis and decision-making, the last step is used to
analyze results of the whole process.
Clustering
Clustering, is part of the four classes of data mining is the procedure of determining
structures and groups in a certain database. It is used to place elements of data into various
related groups. There various techniques of clustering which will discussed later in the chapter
which are maximization (EM) clustering and k-means. One of main objective of clustering is to
group objects in a similar which is different from other groups. Grouping in clustering is done
according to customer preference or logical relationship (Aggarwal, 2016). An example of
clustering is shown by the diagram below
Figure 3: Examples of clustering (Archana, 2015)
There four major types of clustering which are exclusive, overlapping, and hierarchical.
Exclusive type of cluster analysis is where objects are grouped in an exclusive way. This is done
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

CLUSTERING IN DATA MINING 11
so as to find out if there is a assured datum which belongs to a definite cluster. Overlapping type
of clustering uses fuzzy sets so as to classify data (Han, Kamber, & Jian Pei, 2013).
One of the major objective of clustering is decide some of the intrinsic groping is a set of
unrelated data or unlabeled data. The major necessities of clustering in data mining are
scalability, interpretability, high dimensionality, insensitivity, dealing with various types of
attributes, ability to discover clusters with an arbitrary shape, and minimal requirements for
domain knowledge to determine input parameters (Kantardzic, 2014).
Types of clustering in data mining
There are various types of clustering in data mining which are partition clustering,
hierarchical, exclusive, overlapping, and complete. Hierarchical type of cluster analysis is also
known as nesting type of clustering or hierarchical cluster analysis. This is an algorithm that
groups similar objects which are referred to as clusters (Perner, Advances in Data Mining.,
2013). A hierarchical type of clustering starts by first treating each and every observation as a
separate cluster. It then repeatedly executes two steps. The first step is identifying when two
clusters which are closest together. The second step is merging the two most similar steps. These
two steps continues until all the clusters are merged together. Figure 4 below illustrates
hierarchical type of structure
so as to find out if there is a assured datum which belongs to a definite cluster. Overlapping type
of clustering uses fuzzy sets so as to classify data (Han, Kamber, & Jian Pei, 2013).
One of the major objective of clustering is decide some of the intrinsic groping is a set of
unrelated data or unlabeled data. The major necessities of clustering in data mining are
scalability, interpretability, high dimensionality, insensitivity, dealing with various types of
attributes, ability to discover clusters with an arbitrary shape, and minimal requirements for
domain knowledge to determine input parameters (Kantardzic, 2014).
Types of clustering in data mining
There are various types of clustering in data mining which are partition clustering,
hierarchical, exclusive, overlapping, and complete. Hierarchical type of cluster analysis is also
known as nesting type of clustering or hierarchical cluster analysis. This is an algorithm that
groups similar objects which are referred to as clusters (Perner, Advances in Data Mining.,
2013). A hierarchical type of clustering starts by first treating each and every observation as a
separate cluster. It then repeatedly executes two steps. The first step is identifying when two
clusters which are closest together. The second step is merging the two most similar steps. These
two steps continues until all the clusters are merged together. Figure 4 below illustrates
hierarchical type of structure

CLUSTERING IN DATA MINING 12
Figure 4: hierarchical type of clustering (Bock, n.d)
Figure 4: hierarchical type of clustering (Bock, n.d)

CLUSTERING IN DATA MINING 13
The main output for hierarchical clustering is referred to as a dendrogram; the output
usually shows a hierarchical relationship between two main clusters.
Partition clustering is the division of a set of data objects into what database
administrators refer to as overlapping clusters like each object is in exactly one subset. In
partitioning clustering, objects are classified based on their similarities. Some of the common
methods used in this type of clustering are k-means clustering, CLARA algorithm and k-medoids
clustering (Tan, Steinbach, & Kumar, 2014).
k-means method is a method of portioning clustering that is mostly used in unsupervised
machine learning algorithm to partition a certain dataset into k-groups that is k clusters; in here k
represents the number of each group which must be pre-specified by the computer analyst. This
categorizes objects into numerous groups so that the objects are within the identical cluster. The
very first main phase when using k-means method is to specify the quantity of clusters in k
which is then supposed to be generated in the last solution. The method then begins by
haphazardly choosing k objects from a data-set which serve as the first centers for the clusters.
The selected substances are referred to as centroids or cluster means. The third step is assigning
each and every observation to their closest centroid which has to be founded on Euclidean
distance between the centroid and object. The four step is that for each and every k-cluster
apprise the cluster centroid has to update the cluster centroid which is done by calculating the
fresh mean values of every data points in the cluster. The fifth step is to iteratively minimize the
total which has to be within sum of square (Wu & Kuma, 2009).
Clustering Large Applications (CLARA) was first highlighted by kaufaman in 1990; it is
an extension of k-methoids. CLARA algorithm considers a very small sample of data with a
fixed size. It then applies PAM algorithm so as to generate optimal or maximum set of medoids
The main output for hierarchical clustering is referred to as a dendrogram; the output
usually shows a hierarchical relationship between two main clusters.
Partition clustering is the division of a set of data objects into what database
administrators refer to as overlapping clusters like each object is in exactly one subset. In
partitioning clustering, objects are classified based on their similarities. Some of the common
methods used in this type of clustering are k-means clustering, CLARA algorithm and k-medoids
clustering (Tan, Steinbach, & Kumar, 2014).
k-means method is a method of portioning clustering that is mostly used in unsupervised
machine learning algorithm to partition a certain dataset into k-groups that is k clusters; in here k
represents the number of each group which must be pre-specified by the computer analyst. This
categorizes objects into numerous groups so that the objects are within the identical cluster. The
very first main phase when using k-means method is to specify the quantity of clusters in k
which is then supposed to be generated in the last solution. The method then begins by
haphazardly choosing k objects from a data-set which serve as the first centers for the clusters.
The selected substances are referred to as centroids or cluster means. The third step is assigning
each and every observation to their closest centroid which has to be founded on Euclidean
distance between the centroid and object. The four step is that for each and every k-cluster
apprise the cluster centroid has to update the cluster centroid which is done by calculating the
fresh mean values of every data points in the cluster. The fifth step is to iteratively minimize the
total which has to be within sum of square (Wu & Kuma, 2009).
Clustering Large Applications (CLARA) was first highlighted by kaufaman in 1990; it is
an extension of k-methoids. CLARA algorithm considers a very small sample of data with a
fixed size. It then applies PAM algorithm so as to generate optimal or maximum set of medoids
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 14
for the data sample. CLARA method reprates the clustering process in a pre-specified number of
times so as to minimize the sampling bias. The method follows four main steps which are
splitting the randomly selected data set then computing PAM algorithm in each and every subset.
Third, is calculating the sum or the means of dissimilarities in the sampled data set and lastly is
retaining the sub-dataset for which sum or mean is minimal (King, Cluster analysis and data
mining : an introduction, 2015).
k-medoids on the other hand is related to medoidshift algorithm. This method of
partitioning type of clustering breaks a dataset up n objects. The term medoid in this case is an
object which is within a cluster. The method requires the user to quickly specify the k that is the
number of clusters which are to be generated (Wu J. , 2014).
Overlapping type of clustering is used to imitate the point that a data object can
concurrently belong to one or more data groups. It uses ambiguous sets of cluster data so that
each and every point may belong to 2 or more clusters which have different degrees of
membership. In exclusive type of clustering it assigns each and every value to a single cluster.
Complete clustering first does a hierarchical type of clustering which uses a conventional of
dissimilarities on n objects which are being clustered (Mirkin, 2005).
for the data sample. CLARA method reprates the clustering process in a pre-specified number of
times so as to minimize the sampling bias. The method follows four main steps which are
splitting the randomly selected data set then computing PAM algorithm in each and every subset.
Third, is calculating the sum or the means of dissimilarities in the sampled data set and lastly is
retaining the sub-dataset for which sum or mean is minimal (King, Cluster analysis and data
mining : an introduction, 2015).
k-medoids on the other hand is related to medoidshift algorithm. This method of
partitioning type of clustering breaks a dataset up n objects. The term medoid in this case is an
object which is within a cluster. The method requires the user to quickly specify the k that is the
number of clusters which are to be generated (Wu J. , 2014).
Overlapping type of clustering is used to imitate the point that a data object can
concurrently belong to one or more data groups. It uses ambiguous sets of cluster data so that
each and every point may belong to 2 or more clusters which have different degrees of
membership. In exclusive type of clustering it assigns each and every value to a single cluster.
Complete clustering first does a hierarchical type of clustering which uses a conventional of
dissimilarities on n objects which are being clustered (Mirkin, 2005).

CLUSTERING IN DATA MINING 15
Chapter three
Discussion and Analysis
The concept of clustering first originated from Kroeber and Driver in 1930s. Clustering is
not one specific algorithm but a general task which has to be solved. Some of the popular notions
of clustering are small distances among the cluster members (Larose & Larose, 2014).
In data mining, clustering is done to organize data into clusters so that one can be able to
identify the internal structure of the data. At times in data mining portioning is the goal as it can
lead to unforeseen relationship of data. In addition clustering prepares for other artificial
intelligent techniques. With clustering processing can lead to discovery in data; that is the
reoccurring patterns and topics and the underlying rules. To achieve clustering in data mining a
dissimilarity or similarity measure have to be determined so as to cluster the data points on
dissimilarity or similarity in the data. The similarity feature of clustering is said to measure the
degree to which a certain pair of objects are alike. The dissimilarity feature of clustering on the
other hand is a distance measure. This finds the distance between data points or the difference of
the points to the cluster. The distance measure include Euclidean distance measure, Cosine
distance measure, Taninoto Distance measure, and Squared Euclidean distance measure (Eudeka,
2014)
As highlighted in the previous two chapters the main goal of clustering is group similar
objects which are related to each other and must be different from the other objects. Grouping in
clustering is according to logical relationship or must be according to the consumer preferences.
From the various types of clustering; the output of any type must be interpretable, this means that
the results can be interpreted by anyone and can be usable and comprehensible. Second all the
clustering types must have the ability to deal with erroneous or missing data (Maheshwari,
Chapter three
Discussion and Analysis
The concept of clustering first originated from Kroeber and Driver in 1930s. Clustering is
not one specific algorithm but a general task which has to be solved. Some of the popular notions
of clustering are small distances among the cluster members (Larose & Larose, 2014).
In data mining, clustering is done to organize data into clusters so that one can be able to
identify the internal structure of the data. At times in data mining portioning is the goal as it can
lead to unforeseen relationship of data. In addition clustering prepares for other artificial
intelligent techniques. With clustering processing can lead to discovery in data; that is the
reoccurring patterns and topics and the underlying rules. To achieve clustering in data mining a
dissimilarity or similarity measure have to be determined so as to cluster the data points on
dissimilarity or similarity in the data. The similarity feature of clustering is said to measure the
degree to which a certain pair of objects are alike. The dissimilarity feature of clustering on the
other hand is a distance measure. This finds the distance between data points or the difference of
the points to the cluster. The distance measure include Euclidean distance measure, Cosine
distance measure, Taninoto Distance measure, and Squared Euclidean distance measure (Eudeka,
2014)
As highlighted in the previous two chapters the main goal of clustering is group similar
objects which are related to each other and must be different from the other objects. Grouping in
clustering is according to logical relationship or must be according to the consumer preferences.
From the various types of clustering; the output of any type must be interpretable, this means that
the results can be interpreted by anyone and can be usable and comprehensible. Second all the
clustering types must have the ability to deal with erroneous or missing data (Maheshwari,

CLUSTERING IN DATA MINING 16
2015). Third, all the various types of clustering must be able to deal with erroneous or missing
data. Forth, all the various types of clustering must be able to deal with high dimensional data
and low-dimensional data. Forth, as highlighted from literature review all types of clustering
must be scalable; that is, they must be able to deal with a very large databases and have that skill
to deal with various or different kind of characteristics. Five, all types of clustering must be able
to discover or detect cluster of arbitrary shape which should not be bounded to distance measures
only (Zanasi, Brebbia, & Ebecken, 2007)
From literature review clustering has three stages which are shown by the figure below
Figure 4: Stages of clustering
The application of clustering in data mining have two major concepts. First, clustering
can be separate tool so as to get the data dissemination to observe cluster features. The second
concept is that clustering can be used as one of the pre-processing step for other algorithms like
classification and features algorithm (Berry & Browne, 2006).
Clusters
of data
Clustering
algorithms
RAW
DATA
2015). Third, all the various types of clustering must be able to deal with erroneous or missing
data. Forth, all the various types of clustering must be able to deal with high dimensional data
and low-dimensional data. Forth, as highlighted from literature review all types of clustering
must be scalable; that is, they must be able to deal with a very large databases and have that skill
to deal with various or different kind of characteristics. Five, all types of clustering must be able
to discover or detect cluster of arbitrary shape which should not be bounded to distance measures
only (Zanasi, Brebbia, & Ebecken, 2007)
From literature review clustering has three stages which are shown by the figure below
Figure 4: Stages of clustering
The application of clustering in data mining have two major concepts. First, clustering
can be separate tool so as to get the data dissemination to observe cluster features. The second
concept is that clustering can be used as one of the pre-processing step for other algorithms like
classification and features algorithm (Berry & Browne, 2006).
Clusters
of data
Clustering
algorithms
RAW
DATA
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

CLUSTERING IN DATA MINING 17
Why clustering in data mining
Clustering as it can be viewed from the previous chapters organizes data into clusters
which shows an internal structure of the data. The clustering methods are useful in the
knowledge discovery in a certain database. Second, from all the tasks of data mining clustering
is the key task and it can be done by a number of algorithms. Some of the common types of
algorithms used in clustering are partitioning and hierarchical types of algorithms (Chu & Lin,
2005).
Why clustering in data mining
Clustering as it can be viewed from the previous chapters organizes data into clusters
which shows an internal structure of the data. The clustering methods are useful in the
knowledge discovery in a certain database. Second, from all the tasks of data mining clustering
is the key task and it can be done by a number of algorithms. Some of the common types of
algorithms used in clustering are partitioning and hierarchical types of algorithms (Chu & Lin,
2005).

CLUSTERING IN DATA MINING 18
Conclusion
In this research paper, it is evident that clustering is done so as to organize data into
clusters such that there is low inter-cluster similarity, high intra-cluster similarity, informally,
finding natural groupings among objects. The major goal of data mining process as it can be
viewed from this research paper is to extract information from a very large database and then
transform the extracted information in form that is usable. One of the main point which have
been put across by this research paper is that clustering is very essential tool not only in data
mining but also in data analysis. The process of clustering as highlighted from this research
paper, can be done by a number of algorithm which are partitioning, hierarchical, overlapping
and exclusive algorithms.
There three major points that one needs to remember when talking of clustering in data
mining. First is that all data objects in a database are treated as one group. Second, when
performing a cluster analysis database administrators first partition a set of data into groups.
Third, the main advantage of clustering over other tasks in data mining is that assist in singling
out some of the useful features which helps in distinguishing the different groups. Lastly, clusters
have not yet received a critical breakthrough but in future and the current development of
modern technology we will see major breakthrough which will result in adoption of clustering
process in data mining.
Conclusion
In this research paper, it is evident that clustering is done so as to organize data into
clusters such that there is low inter-cluster similarity, high intra-cluster similarity, informally,
finding natural groupings among objects. The major goal of data mining process as it can be
viewed from this research paper is to extract information from a very large database and then
transform the extracted information in form that is usable. One of the main point which have
been put across by this research paper is that clustering is very essential tool not only in data
mining but also in data analysis. The process of clustering as highlighted from this research
paper, can be done by a number of algorithm which are partitioning, hierarchical, overlapping
and exclusive algorithms.
There three major points that one needs to remember when talking of clustering in data
mining. First is that all data objects in a database are treated as one group. Second, when
performing a cluster analysis database administrators first partition a set of data into groups.
Third, the main advantage of clustering over other tasks in data mining is that assist in singling
out some of the useful features which helps in distinguishing the different groups. Lastly, clusters
have not yet received a critical breakthrough but in future and the current development of
modern technology we will see major breakthrough which will result in adoption of clustering
process in data mining.

CLUSTERING IN DATA MINING 19
References
Abbass, H. A., Sarker, R. A., & Newton, C. S. (2010). Data mining : a heuristic approach. Idea
Group.
Aggarwal, C. C. (2016). Data mining : the textbook. Cham: New York : Springer.
Archana. (2015). 2015. Retrieved from Slideshare:
https://www.slideshare.net/archnaswaminathan/cdm-44314029
Azzalini, A., & Scarpa, B. (2012). Data Analysis and Data Mining : an Introduction. Oxford:
Oxford Press.
Berry, M. W., & Browne, M. (2006). Lecture notes in data mining. NewYork: Hackensack.
Bock, T. (n.d). What is Hierarchical Clustering? Retrieved from Displayr:
https://www.displayr.com/what-is-hierarchical-clustering/
Bramer, M. (2017). Principles of data mining. London : Springer.
Chu, W. W., & Lin, T. Y. (2005). Foundations and advances in data mining. Berlin: New York :
Springer.
Cordeiro, R. L., Faloutsos, C., & Júnior, C. T. (2013). Data Mining in Large Sets of Complex
Data by Robson L F Cordeiro . London: Springer London.
Eudeka. (2014, July 4th). K means Clustering . Retrieved from Slideshare:
https://www.slideshare.net/EdurekaIN/k-means-clustering
References
Abbass, H. A., Sarker, R. A., & Newton, C. S. (2010). Data mining : a heuristic approach. Idea
Group.
Aggarwal, C. C. (2016). Data mining : the textbook. Cham: New York : Springer.
Archana. (2015). 2015. Retrieved from Slideshare:
https://www.slideshare.net/archnaswaminathan/cdm-44314029
Azzalini, A., & Scarpa, B. (2012). Data Analysis and Data Mining : an Introduction. Oxford:
Oxford Press.
Berry, M. W., & Browne, M. (2006). Lecture notes in data mining. NewYork: Hackensack.
Bock, T. (n.d). What is Hierarchical Clustering? Retrieved from Displayr:
https://www.displayr.com/what-is-hierarchical-clustering/
Bramer, M. (2017). Principles of data mining. London : Springer.
Chu, W. W., & Lin, T. Y. (2005). Foundations and advances in data mining. Berlin: New York :
Springer.
Cordeiro, R. L., Faloutsos, C., & Júnior, C. T. (2013). Data Mining in Large Sets of Complex
Data by Robson L F Cordeiro . London: Springer London.
Eudeka. (2014, July 4th). K means Clustering . Retrieved from Slideshare:
https://www.slideshare.net/EdurekaIN/k-means-clustering
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLUSTERING IN DATA MINING 20
Gan, G. (2016). Data clustering in C++ : an object-oriented approach. NewYork.
Giusti, A., Ritter, G., & Vichi, M. (2014). Classification and data mining. Berlin.
Han, J., Kamber, M., & Jian Pei. (2013). Data mining : concepts and techniques. Amsterdam:
Amsterdam Press.
Hemlata Shau, s. s. (n.d). A bbrief overview on data mining. ITCTEE, 1-18.
Kantardzic, M. (2014). Data mining : concepts, models, methods, and algorithms by Mehmed
Kantardzic. Chicago: IEEE Press.
King, R. S. (2015). Cluster analysis and data mining : an introduction. Virginia.
King, R. S. (2015). Cluster analysis and data mining : an introduction. Virginia.
Klösgen, W. (2002). Handbook of data mining and knowledge discovery by Willi Klösgen .
Oxford: Oxford University Press.
Larose, D. T., & Larose, C. D. (2014). Discovering knowledge in data : an introduction to data
mining. John Wiley & Sons.
Maheshwari, A. K. (2015). Business intelligence and data mining. Chicago: Business Expert
Press.
Maimon, O., & Rokach, L. (2010). Data mining and knowledge discovery handbook by Oded
Maimon . NewYork: Springer.
Maimon, O., & Rokach, L. (2010). Data mining and knowledge discovery handbook by Oded
Maimon. New York: Spring Press.
Gan, G. (2016). Data clustering in C++ : an object-oriented approach. NewYork.
Giusti, A., Ritter, G., & Vichi, M. (2014). Classification and data mining. Berlin.
Han, J., Kamber, M., & Jian Pei. (2013). Data mining : concepts and techniques. Amsterdam:
Amsterdam Press.
Hemlata Shau, s. s. (n.d). A bbrief overview on data mining. ITCTEE, 1-18.
Kantardzic, M. (2014). Data mining : concepts, models, methods, and algorithms by Mehmed
Kantardzic. Chicago: IEEE Press.
King, R. S. (2015). Cluster analysis and data mining : an introduction. Virginia.
King, R. S. (2015). Cluster analysis and data mining : an introduction. Virginia.
Klösgen, W. (2002). Handbook of data mining and knowledge discovery by Willi Klösgen .
Oxford: Oxford University Press.
Larose, D. T., & Larose, C. D. (2014). Discovering knowledge in data : an introduction to data
mining. John Wiley & Sons.
Maheshwari, A. K. (2015). Business intelligence and data mining. Chicago: Business Expert
Press.
Maimon, O., & Rokach, L. (2010). Data mining and knowledge discovery handbook by Oded
Maimon . NewYork: Springer.
Maimon, O., & Rokach, L. (2010). Data mining and knowledge discovery handbook by Oded
Maimon. New York: Spring Press.

CLUSTERING IN DATA MINING 21
Maloof, M. A. (2006). Machine learning and data mining for computer... by Marcus A Maloof .
London: Springer.
Mirkin, B. (2005). Clustering for data mining : a data recovery approach. London: Boca Raton.
Olson, D. L. (2015). Descriptive data mining. Singapore: Springer Nature.
Perner, P. (2013). Advances in Data Mining. London: London: Springer.
Perner, P. (2014). Machine learning and data mining in pattern recognition : 10th International
Conference, MLDM 2014, St. Petersburg, Russia, July 21-24, 2014. Proceedings.
Springer.
Tan, H. (2012). Knowledge Discovery and Data Mining. Berlin: Springer Berlin Heidelberg.
Tan, P.-N., Steinbach, M., & Kumar, V. (2014). Introduction to data mining by Pang-Nin Tan .
Pearson.
Wu, J. (2014). Advances in k-means clustering : a data mining thinking. London: Springer.
Wu, X., & Kuma, V. (2009). The top ten algorithms in data mining. london: CRC Press.
Zanasi, A., Brebbia, C. A., & Ebecken, N. F. (2007). Data mining VIII : data, text and web
mining and their business applications. Chicago: WIT press.
Maloof, M. A. (2006). Machine learning and data mining for computer... by Marcus A Maloof .
London: Springer.
Mirkin, B. (2005). Clustering for data mining : a data recovery approach. London: Boca Raton.
Olson, D. L. (2015). Descriptive data mining. Singapore: Springer Nature.
Perner, P. (2013). Advances in Data Mining. London: London: Springer.
Perner, P. (2014). Machine learning and data mining in pattern recognition : 10th International
Conference, MLDM 2014, St. Petersburg, Russia, July 21-24, 2014. Proceedings.
Springer.
Tan, H. (2012). Knowledge Discovery and Data Mining. Berlin: Springer Berlin Heidelberg.
Tan, P.-N., Steinbach, M., & Kumar, V. (2014). Introduction to data mining by Pang-Nin Tan .
Pearson.
Wu, J. (2014). Advances in k-means clustering : a data mining thinking. London: Springer.
Wu, X., & Kuma, V. (2009). The top ten algorithms in data mining. london: CRC Press.
Zanasi, A., Brebbia, C. A., & Ebecken, N. F. (2007). Data mining VIII : data, text and web
mining and their business applications. Chicago: WIT press.
1 out of 21
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.