logo

Data Mining Assignment | K-means Algorithm

   

Added on  2020-05-11

12 Pages2824 Words58 Views
K-means algorithm in Data Mining 1
K-MEANS ALGORITHM IN DATA MINING
By
Course:
Tutor:
University:
Date:

2
Executive summary
K-means is one of the most popular set of rules or procedures. It is actually perpetuated by its
austerity and consistency in the field of data mining (Cui 2014). Its trails a very naive and
relatively simple mechanisms to organize data sets over guaranteed sum of cluster. Processing of
large-scale data using k-means algorithm helps in grouping of related observations without any
prior knowledge of the relationships. It actually selects K points as the beginning cluster centers.
It is useful in identification of patterns of large volumes of data. Clustering is also very
important in exploration of data, detection and maybe informed ways of how to handle
anomalies, and some aspects of data segmentation. Interpreting of various clusters can be
difficult and certain algorithms can be very useful in the course of this particularly.
Clustering is very fundamental for instance in certain particular issues such as building of
regulatory networks, discovery of the various subtypes of a disease in medicine, maybe inferring
of unknown gene functions in medicine as well and reduction of dimensionality.
The objective of the project is to get generous outcome while giving accreditation of the
disclosure of data. K-means cluster algorithm works by deriving solutions without initial counts
on data. This algorithm is a computational discipline that is valuable in data analysis, reliable
decision making and knowledge discovery. Algorithms determine the overall outcomes of
domain clustering of data and its processing. When using k-means, one of the drawbacks is that,
you must specify the number of clusters as raw data to the algorithm, due to this, it is advisable
to experiment using diverse k values in order to identify the value that perfectly suits the data
that you have.
2

3
The analyzation of data into patterns is accomplished upon scrutinizing data points together with
the cluster numbers (Aggarwal 2013)
Introduction
Description of the K-means Algorithm.
Data analysis is using algorithm software is considered very important especially when dealing
with large volume of data. The gathering is actually a method aimed at discovery of resemblance
sets in a data, called clusters. Data clustering brings an order in the data and hence further
processing on this data is made easier. It is actually involving the processes of organization of
data into high end intra-class similarities and lower intra-class similarities and it is very
essential in avenues such as the optical character recognition, biometrics, diagnostics systems
and it even extends to military operations in any case it helps in data extraction. Data clustering
algorithm in itself has some very necessary steps such as the process should be in a uniform
manner, should actually be able to handle certain diverse features, distribution of data clusters
should be such that data items in a unique constellation should be comparable and it should also
be able to remove all noise and outliers from data sets.
The process is actually the scrutiny of the observed datasheets aimed at obtaining the
relationship among datasets. The method is a partitioning means used to verify information and
its results. The K-means algorithm is in a high-level data analysis. It can’t be summarized in a
formula and is actually also iterative. K-means approach is a partitioning approach, the data
are partitioned into groups at each iteration of the given algorithm. The greatest objective of
this given algorithm is to obtain the various groups in the data. Data mining focuses so much
on goal identification and creation of the target data to be collected for a given use in some given
3

4
order or plan. This forms a very essential aspect of the process of Interpretation and decision
making. The data collected on observation is treated as objects basing arguments on positions
and space which exists between input data points. Segregating of the entities into high-class
masses (K) is achieved through having the objects remain as close to each other as possible. The
clusters are characterized by the center points of each of them, the center points are generally
referred to us centroids. K-means is applicable mainly exploratory data evacuation where one
must examine the gathering results to determine which clusters make sense. Clustering distances
used in most times do not clearly show the three dimensional distances. The available answer to
the challenge of getting least global is the tiring selection of preliminary points.
It is through computing of the average of every coordinate that a centroid whose co-ordinates are
obtained is assigned clusters. There exist steps to follow when computing k-means clustering,
below are some of the steps followed;
Steps followed:
Step 1: Set K- in a way to choose a number of desired clusters, k.
Step 2: Initialization - This is executed in order to select k preliminary points that are prior
estimates of centroids.
Step 3: Classification - This step is used to establish every object in the set of data and put it to
the cluster.
Step 4: Calculation of centroids - When each of the datasets is allocated to a cluster, it is needed
to reconstitute the new k centroids.
4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
K-Means Algorithm using Map reduce
|12
|1961
|97

Approaches to clustering in Data mining
|19
|10850
|64

Benefits and Challenges of Health Informatics in Healthcare Education
|10
|2219
|122

Data Mining - Practical Machine Learning Tools and Techniques
|6
|1535
|230

Significant Unsupervised Learning
|11
|2602
|313

P3 Demonstrate Various Scopes of Data Mining
|11
|1569
|177