Extracting Insights from Text Data: SPSS Cluster Analysis Project

Verified

Added on  2021/04/17

|4
|804
|411
Project
AI Summary
This project demonstrates the application of cluster analysis in SPSS using text data extracted from online comments. The objective is to identify distinct customer clusters based on their expressed opinions and behaviors. The methodology involves manual analysis and scoring of comments across nine criteria, followed by importing the data into SPSS for K-means cluster analysis. The analysis identified three clusters, each characterized by specific traits: the first cluster shows clear ideas and a present-focused perspective, the second lacks clear ideas and avoids criticism, and the third focuses on personalities and has clear ideas. The project discusses limitations such as a small sample size and recommends targeted products and services for each segment, such as food products for the first, any product for the second, and personal care items for the third. The project highlights the importance of cluster analysis in data-driven marketing and customer segmentation.
Document Page
Cluster analysis in SPSS from text data
Objectives
The main objective of the current research is to extract the data from the comments and identify
the clusters. The cluster analysis helps to group the customers with same features/ behaviors in
one group and customized marketing can be done.
Method
For the current research text data has collected from the comments of people on line posts. After
collecting the text data each comments were manually analyzed and measured in 9 different
criteria. This criteria includes:
Clarity of Ideas
emotion
objectivity
past perspective
now Persepctive
future perspective
focus on Personalities
criticism of corporate
Criticism of government
Each comments were given a score between 1 and 7. After the coding, data from the excel was
imported to SPSS and cluster analysis was performed.
Clusters
Cluster analysis was performed in SPSS and maximum 3 clusters have been identified. K-means
cluster analysis has been performed which makes the clusters based on the distance of each data
point from its centroids. In this way data points with similar pattern are taken into a single group.
Initial Cluster Centers
Cluster
1 2 3
Clarity of Ideas 1.00 1.00 2.00
emotion 2.00 4.00 6.00
objectivity 3.00 6.00 1.00
past perspective 3.00 6.00 7.00
now persepctive 1.00 7.00 3.00
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
future perspective 4.00 2.00 7.00
focus on personalies 3.00 7.00 1.00
criticism of corporate 1.00 7.00 7.00
Criticism of
government
1.00 2.00 7.00
Table 1 Results from the cluster analysis
Iteration Historya
Iteration Change in Cluster Centers
1 2 3
1 4.773 7.254 5.287
2 1.710 .397 .905
3 1.212 .000 .251
4 .624 .134 .000
5 .000 .000 .000
a. Convergence achieved due to no or
small change in cluster centers. The
maximum absolute coordinate change
for any center is .000. The current
iteration is 5. The minimum distance
between initial centers is 10.724.
Table 2 Results from cluster analysis
Final Cluster Centers
Cluster
1 2 3
Clarity of Ideas 2.14 4.11 2.75
emotion 3.00 4.23 2.84
objectivity 3.00 5.89 4.50
past perspective 5.14 6.51 6.25
now persepctive 2.71 6.46 5.41
future perspective 6.43 6.66 6.53
focus on personalies 4.71 6.66 1.69
criticism of corporate 5.14 6.69 7.00
Criticism of
government
4.14 6.69 6.91
Document Page
Table 3 Results from the final clusters
Number of Cases in each
Cluster
Cluster
1 7.000
2 35.000
3 32.000
Valid 74.000
Missing .000
As shown in the table above the final clusters have been identified after 5 iteration. The iteration
stops when it is not possible to further group the data points on the basis of the data points. As
per the table the people in the first cluster are those who have very clear of ideas and are
emotional. These people believes in current perspective rather than the future perspective. On the
other hand people in the second cluster are those who do not clear ideas and they do not care
much about the other things. They do not criticize either the government or the corporates. They
do not focus on personalities also. Finally the people in the third cluster are very clear on their
ideas and they are very much focused on personalities. In fact this cluster focus more on
personalities more than people in other clusters. In terms of criticism like the second cluster they
do not criticize neither the corporate nor the government.
Limitations
One of the main limitation of the current research is that the sample size is less with only 75 data
points. Similarly only three clusters have been identified which could have increased if more
sample size was included. Similarly some of the comments are not very helpful as they do not
provide much information. One can filter out such comments to get better results.
Segment Developed
Since there are three clusters three segment has been developed. The first one with clear ideas
and criticism. They are very clear on what they want and do not hesitate to criticize. People in
this segment believe in present situation.
Document Page
The second segment consists of people who are neither clear on their ideas and they do not
criticize.
Finally in the third segment people are very clear on ideas and they focus on personalities.
Products and services Recommended
On the basis of the behavior of the clusters and their preference food products can be
recommended to the first cluster as they are very clear on their ideas. The second cluster do not
have much to offer so any product can be offered to them. Finally the people in third cluster
focus more on personalities so personal care products and clothes can be recommended for that
cluster.
chevron_up_icon
1 out of 4
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]