Data Analysis and Visualization: Industry Employment Analysis

Verified

Added on  2023/01/11

|7
|854
|45
Practical Assignment
AI Summary
This assignment focuses on the analysis of industry employment data from 2009 to 2018. It begins with data processing, followed by an analysis of employment trends across various industries, including real estate, ICT, finance, and public administration. The analysis identifies industries with the highest and lowest employment, as well as those with the highest and lowest growth rates. The assignment further explores the best and worst performing years in terms of employment numbers. Visual analysis is performed using charts to represent employment data, and correlation analysis is conducted to identify relationships between industries and employment. Finally, the assignment utilizes clustering techniques, specifically K-means and hierarchical clustering, to group industries based on employment data, providing interpretations of the results and comparing the clusters generated by the two methods. This comprehensive analysis provides valuable insights into the dynamics of the employment landscape across different sectors.
Document Page
PROGRAMMING
FOR
DATA ANALYSIS
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1. Data Processing
Industry 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
Real Estate 25,20
0
18,20
0
22,70
0
19,10
0
22,20
0
18,00
0
18,80
0
17,60
0
14,60
0
13,50
0
ICT 31,50
0
32,10
0
31,00
0
24,00
0
32,40
0
26,90
0
27,20
0
26,40
0
27,90
0
27,80
0
Finance 35,50
0
40,20
0
34,40
0
30,80
0
35,70
0
32,40
0
31,10
0
33,20
0
29,80
0
33,80
0
Agriculture 41,10
0
58,90
0
43,20
0
40,70
0
42,70
0
36,80
0
36,10
0
36,10
0
38,20
0
37,70
0
Other_Service 81,80
0
83,20
0
72,40
0
77,20
0
73,30
0
75,50
0
72,80
0
72,40
0
68,00
0
64,20
0
Construction 101,8
00
90,80
0
102,7
00
92,60
0
97,00
0
89,30
0
91,30
0
90,00
0
93,20
0
96,60
0
Production 165,7
00
165,1
00
161,2
00
166,2
00
152,9
00
149,9
00
137,3
00
143,6
00
145,8
00
144,8
00
Professional
Service
187,1
00
176,4
00
162,5
00
172,3
00
173,3
00
164,2
00
154,4
00
158,6
00
149,8
00
156,7
00
Retail 347,6
00
333,5
00
360,2
00
357,7
00
337,3
00
345,1
00
347,3
00
343,1
00
344,5
00
345,4
00
Public_Adminstr
ation
434,9
00
424,5
00
418,5
00
423,2
00
427,6
00
427,0
00
421,0
00
425,6
00
418,6
00
415,6
00
2. Data Analysis
2.1 Which industry employed highest and lowest workers over the period?
Real Estate
ICT
Finance
Agriculture
Other_Service
Construction
Production
Professional Service
Retail
Public_Adminstration
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
No. of employees
No. of employees
Document Page
Interpretation: Real state has employed lowest employees which are around 189,900 over 9
year’s period from 2009 to 2018; while Public administration has employed highest employees
around 4,236,500.
2.2 Which industry has the highest and lowest overall growth over the period?
The highest growth industry is real estate with over 86.67% growths has been recorded in this
industry since 2009; on the other hand the lowest growth has been seen in Retail Company with
only 0.64% growth since 2009.
2.3 Which years are the best and worst performing year in relation to number of employment
The highest employment has been seen in the year 2018 with 1,452,200 people were employed
in this year; on the other hand the lowest number of employees were recruited in the year 2010
with only 1,330,440 employees recruited from all over the industry.
3. Visual Analysis
2008 2010 2012 2014 2016 2018 2020
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
Chart Title
Real Estate
Finance
Other_Service
Production
Retail
Axis Title
Axis Title
Document Page
4. Correlation
4.1 Taking average employment number for each industry over the period, show and identify the
highest and lowest correlated industries
Industry Employment
Industry 1
Employment 0.89715974 1
4.2 Make a year wise correlation for each industry. Does the aforementioned industries are also
correlated over the each year? Explain your answer
Row 1 Row 2
Row 1 8.25
Row 2 -100375 1436192500
Comparison: After comparing 4.1 and 4.2 correlation relation, it was found that industry and
employment numbers have positive relationship, which means real state has lowest employment
relations and public administration is highest correlated industry. On the other hand; industries
has negative relationship over the each year.
5. Clustering (K means & hierarchical)
5.1 Using the best and worst performing year column’s employment data (2.3) undertake a K
means clustering analysis (K=2 & 3) and identify industries cluster together. Write your own
interpretation.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Interpretation: The result shows maximum number of employment in 2018 which is number 7
and lowest number of employment in 2010 which is number 2. K-means clustering is a
mechanism for vector measurement, initially from signal preparation, which intends to separate
ideas into k- means where all ideas with the base have a place with a closer sense (lower or lower
focus centroid), complete as model body. This leads to the analysis of the information space in
the Voronoi cells. It is best known for group study in data mining. K-means limits of origin
within group differences (quadratic Euclidean partition), but not standard Euclidean partition,
Weber's case would be more problematic: the average improves squared errors, while only the
geometric boundaries that limit Euclidean separation. For example, Euclidean oscillations are
best found using k-medians and k- means.
5.2 Using the same dataset (best & worst performing) create a hierarchical cluster. Compare the
cluster with k means clusters.
Document Page
Document Page
2.2
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]