Clustering Algorithms in Quantitative Business Analysis: A Deep Dive

Verified

Added on 2022/09/03

AI Summary

Running head: Quantitative Business Analysis
Quantitative Business Analysis
Name of the student
Name of the University

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2Quantitative Business Analysis
Table of Contents
K Means Clustering:..................................................................................................................3
Hierarchical Clustering:.............................................................................................................3
DBSCAN clustering:..................................................................................................................4
References:.................................................................................................................................6

3Quantitative Business Analysis
K Means Clustering:
Clustering is the act of dividing a group into different parts according to some criteria.
It is a common exploratory technique in data analysis to make sense of the data. It functions
by dividing the data into subgroups with each subgroup sharing a common property. The
similarity can be based on Euclidean distance or their correlation and is specific for each task.
K means algorithm uses iteration to partition a dataset into K pre-defined unique
clusters where every point lies within only one group and there is no overlap. The algorithm
tries to make the data points in one cluster as similar to each other as possible and keeping the
other subgroups as far apart as possible. The optimal situation happens if the sum of the
squared distance between the centroid of a cluster and its data points is at the minimum.The
K means algorithm follows the approach of expectation maximization to form the clusters.
The M step is figuring out the centroid of individual clusters and the E step is fixing the
points to the nearest cluster.
One common clustering analysis that is done is used during market segmentation;
where different categories of customers are offered different offers for the business.
Overall, K means is one of the easiest to implement clustering methods. With a small
k, k means clustering is usually computationally faster than hierarchical clustering and
produces tighter cluster than the other clustering methods.
Few disadvantages of K means clustering are that it is not always feasible to predict K
beforehand; the final results are altered when the order of the data is changed; k means is
sensitive to scale, normalization or standardization changes the final results.
Hierarchical Clustering:
Hierarchical Clustering is a type of clustering which creates a hierarchy of clusters.
There are two types of hierarchical clustering:
Agglomerative Hierarchical Clustering:
In this type of clustering, each point is considered a cluster in itself. After every iteration,
similar clusters overlap with other clusters till gradually K clusters are formed.
Divisive Hierarchical Clustering Technique: Divisive clustering has fewer uses in the
real world and its approach is opposite to the way of working of Agglomerative clustering. In

4Quantitative Business Analysis
this clustering technique, all the points from the cluster that are not similar are separated.
Each data point after being separated is counted as an unique cluster. Finally the process ends
with n clusters.
Hierarchical clustering is better than K means in the way that the former outputs a
hierarchy which is more informative than just the set of flat clusters returned by K means. It
becomes much easier to select a K by looking at the dendogram. It is also relatively easier to
implement.
A few drawbacks of Hierarchical clustering are that there aren’t any fixed
mathematical approach for this method of clustering; the methods to calculate the similarity
between clusters have their unique disadvantages; high space and time complexity for
hierarchical clustering means that it is cumbersome to use it in case of huge data.
DBSCAN clustering:
Density based spatial clustering of applications with noise is another commonly used
clustering technique and relatively recently discovered. Given a collection of points,
DBSCAN clubs together similar points based on their Euclidean distance and a minimum
number of points. The points which are present in low density regions are marked as outliers.
This algorithm requires two main parameters:
Esp: which says how far away the points should be from one another to be part of a cluster.
minPoints: the least number of points needed to form a dense region.
The algorithm proceeds roughly in the following ways:
a) Identify all the neighbour points and the core points
b) For each core points a new cluster is created if it isn’t already a part of a cluster
c) Find iteratively all the density connected points and put them in the same cluster as
the core point.
d) For the remaining unvisited points in the dataset the same iteration is done. The left
out points that do not belong to any cluster are noise.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

5Quantitative Business Analysis
A few advantages of DBSCAN are that it bypasses the drawbacks of K means clustering such
in a presence of a non-spherical data space or the presence of outliers. It can cluster any
arbitrarily shaped data.
DBSCAN however performs poorly in case of data with varying densities.

6Quantitative Business Analysis
References:
Duran, B. S., & Odell, P. L. (2013). Cluster analysis: a survey (Vol. 100). Springer Science
& Business Media.
Marsland, S., 2014. Machine learning: an algorithmic perspective. Chapman and Hall/CRC.
Shirkhorshidi, A. S., Aghabozorgi, S., Wah, T. Y., & Herawan, T. (2014, June). Big data
clustering: a review. In International conference on computational science and its
applications (pp. 707-720). Springer, Cham.
Tan, P. N., Steinbach, M., & Kumar, V. (2013). Data mining cluster analysis: basic concepts
and algorithms. Introduction to data mining, 487-533.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for
discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, №34, pp.
226–231).