Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Analyzing Death Rate Factors: Immunization & Road Injuries

Verified

Added on 2020/07/22

AI Summary

This assignment involves analyzing data in R to determine the impact of vaccination (immunization) and road injury on death rates among nations. The study uses clustering methods like k-means and hclust, along with linear regression modeling. Key findings include similar patterns across groups but other factors also contribute significantly to death rate variations.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

DATA ANALYSIS

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

TABLE OF CONTENTS
1INTRODUCTION.....................................................................................................................................................................................3
2Data setup..................................................................................................................................................................................................3
3Exploratory data analysis...........................................................................................................................................................................4
3.1One variable analysis..........................................................................................................................................................................4
3.2Two variable analysis.........................................................................................................................................................................6
4Advanced analysis.....................................................................................................................................................................................8
4.1 Cluster analysis..................................................................................................................................................................................8
4.2Linear regression..............................................................................................................................................................................15
5Conclusion...............................................................................................................................................................................................20
6Reflection.................................................................................................................................................................................................20
REFERENCES..........................................................................................................................................................................................21

1INTRODUCTION
In current time period analytic tools are used at large scale by the firms to solve business related problems. Currently,
healthcare firms that are operating at international level are facing problems in identifying way in which physical resources must be
allocated in proper manner across the nations so that best use of cash and optimum utilization can be made in the business. Healthcare
sector firms like hospital chains cares about this problem. If resources will not be allocated in systematic way then in that case firm
may failed to cater needs of patients in systematic way. Data is taken from the World Bank website for analysis purpose and on same
analysis is done by applying cluster analysis and regression analysis techniques.
2Data setup
In order to load the data specific syntax is used in the R software. By using below given code data is imported from CSV sheet
to R software for analysis purpose. Libraries that is needed is hclust.
> ssp<-read.table("D:\\Tapan\\A41593 Data analysis\\data.csv",header=T,sep=",")
> ssp<-read.table("D:\\Tapan\\A41593 Data analysis\\data.csv",header=T,sep=",")
> str(ssp)
'data.frame': 11 obs. of 4 variables:
$ Countries : Factor w/ 11 levels "Cambodia","China",..: 1 2 4 5 7 8 9 10 11 3 ...
$ Death.rate : num 6.22 7.16 7.16 10.1 4.87 ...
$ Immunization : int 90 99 86 93 99 99 86 99 95 99 ...
$ Mortality.by.road.injury: num 17.4 18.8 15.3 4.7 24 21 20.3 3.6 24.5 5.8 ...
Dimensions for data set is given below.
> dim(ssp)

[1] 11 4
Results are reflecting that there are 11 rows and four columns in the dataset. It the dataset variables are countries, death rate,
immunization and mortality by road injury. It can be said that there are categorical and continuous variables in the dataset.
3Exploratory data analysis
3.1One variable analysis
Table 1Descriptive statistics
> summary(ssp)
Countries Death.rate Immunization Mortality.by.road.injury
Cambodia :1 Min. : 4.600 Min. :86.00 Min. : 2.90
China :1 1st Qu.: 5.976 1st Qu.:88.00 1st Qu.: 5.25
Fiji :1 Median : 6.815 Median :95.00 Median :17.40
Indonesia:1 Mean : 6.742 Mean :93.73 Mean :14.39
Japan :1 3rd Qu.: 7.160 3rd Qu.:99.00 3rd Qu.:20.65
Kirbati :1 Max. :10.100 Max. :99.00 Max. :24.50
(Other) :5

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Interpretation
Results are reflecting that mean value of death rate is 6.74 and max value is 10.10 which means that there is diffeence between
max death rate and average value of same. On other hand, min value is 4.600 which means that mean death rate value is nearby to
mean value. In case of immunization mean value is 95 and max value is 99 as well as min value is 86. This reflects that death due to
immunization is nearby to max value which reflect that due to weak immune system death rate is high across the nations. Mortalirty
by road injury mean value is 14.39 and its max value is 24.50 followed by min value which is 2.90. This means that death due to road
accident on average basis is nearby to maximum level. On basis of analysis it can be said that death due to weak immune system and
road accidents is increasing at rapid pace.
3.2Two variable analysis
> cor(ssp$Death.rate,ssp$Immunization)
[1] -0.4557288

> cor(ssp$Death.rate,ssp$Mortality.by.road.injury)
[1] -0.2711874
> cov(ssp$Death.rate,ssp$Immunization)
[1] -4.049809
> cov(ssp$Death.rate,ssp$Mortality.by.road.injury)
[1] -3.533984
Interpretation

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Correlation and covariance value is negative for the variable immunization -0.45 and for variable mortality by road
injury its value is -0.27 which means that with increase in death rate death cause by immunization and road accidents declined. This
fact is verified from the covariance value which is mortality by road injury -3.53 and same for immunization is -4.04. This means that
there are number of factors due to which death rate increased but death due to immunization and road accidents decline to some
extent.
4Advanced analysis
4.1 Cluster analysis
Clustering is the concept under which different variables are grouped together with each other due to similarity in their nature.
Pattern matrix is usually used in the cluster analysis and under this similarity in patterns is identified and on that basis proximity
matrix is developed. On the basis of proximity clusters are formed (Tan, Steinbach and Kumar, 2013). K means clustering is the one
of the method that is used commonly by data scientists. Under this method from the data set R randomly takes any data points to form
the cluster. Thereafter, those data points that have similarity with the randomly selected points are used to form the cluster. After
completion of this stage mean value is computed to form the centroid. Those data points that are nearby to the newly computed mean
value are used to form the clusters. After doing so again mean value of newly formed cluster is computed and relevant data points are
included in the cluster. In this way common data points are step by step taken in to account to form the clusters ( Niknam and Amiri,
2010). It can be said that consistently new mean value of cluster or centroid is computed and data points similar to same is taken in to
account. It can be said that through algorithm of K means in systematic way data is analyzed by the data scientists.
> set.seed(1234)
> sdf2<-kmeans(ssp[,2:4],3)
> sdf2
K-means clustering with 3 clusters of sizes 4, 3, 4

Cluster means:
Death.rate Immunization Mortality.by.road.injury
1 7.137250 94.25000 4.25000
2 7.210333 87.33333 17.66667
3 5.994500 98.00000 22.07500
Clustering vector:
[1] 2 3 2 1 3 3 2 1 3 1 1
Within cluster sum of squares by cluster:
[1] 134.93003 25.34775 36.17565
(between_SS / total_SS = 81.8 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size"
"iter" "ifault"
> table(sdf$Countries,sdf2$cluster)
Error in table(sdf$Countries, sdf2$cluster) :
all arguments must have the same length
> table(ssp$$Countries,sdf2$cluster)

Error: unexpected '$' in "table(ssp$$"
> table(ssp$Countries,sdf2$cluster)
1 2 3
Cambodia 0 1 0
China 0 0 1
Fiji 1 0 0
Indonesia 0 1 0
Japan 1 0 0
Kirbati 1 0 0
Malaysia 0 0 1
Mongolia 0 0 1
Myanmar 0 1 0
Singapore 1 0 0
Vietnam 0 0 1
> library(ggplot2)
> ggplot(sdf,aes(Death.rate,Mortality.by.road.injury,color=Countries))+geom_point()
> ggplot(ssp,aes(Death.rate,Mortality.by.road.injury,color=Countries))+geom_point()
> ggplot(ssp,aes(Death.rate,Immunization,color=Countries))+geom_point()

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

> gdh<-hclust(dist(ssp[,2:4])
+ )

> print(gfh)
Error in print(gfh) : object 'gfh' not found
> print(gdh)
Call:
hclust(d = dist(ssp[, 2:4]))
Cluster method : complete
Distance : euclidean
Number of objects: 11
> plot(gdh)
> rect.hclust(gdh,k=3)
> groups<-cutree(gdh,k=3)
>
> plot(gdh,ssp$Countries)
> rect.hclust(gdh,k=3)
> groups<-cutree(gdh,k=3)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Interpretation
Results are reflecting that there are three clusters in the diagram. It can be observed that in the first cluster there are two nations
like Japan, Singapore, Fiji and Singapore. In second cluster there are three nations like Myanmar, Cambodia and Indonesia. Whereas,
in third cluster nations encompassed are China, Mongolia, Malaysia and Vietnam. It can be said that nations that comes in these
groups are similar to each other in terms of death rate, immunization and road injury. This means that assets can be allocated in similar
way across the nations groups which is Japan, Singapore, Fiji and Singapore in firs cluster, Myanmar, Cambodia and Indonesia in
second cluster and China, Mongolia, Malaysia and Vietnam in third cluster. This is concluded because trends in terms of death rate,

immunization and other variables is same across these nations. Hence, resource allocation policy that is followed for one nation of
cluster can be followed for other nations of same cluster.
4.2Linear regression
Regression analysis is the one of the main method that is used to analyze the data. By using this tool relationship between
dependent and independent variable is identified (7 types of regression techniques you must know, 2017). This approach help one in
identifying the difference that is between the variables in terms of rate of change in both. It can be said that it is the one of the
important approach that is used to analyze the data set.
H0: There is no significant mean difference between the mean value of death rate and immunization.
H1: There is significant mean difference between the mean value of death rate and immunization.
> modl<-lm(ssp$Death.rate~ssp$Immunization)
> summary(modl)
Call:
lm(formula = ssp$Death.rate ~ ssp$Immunization)
Residuals:
Min 1Q Median 3Q Max
-1.5027 -0.8889 -0.5181 0.6436 3.2702
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.1000 7.4078 2.443 0.0372 *

ssp$Immunization -0.1212 0.0789 -1.536 0.1589
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.442 on 9 degrees of freedom
Multiple R-squared: 0.2077, Adjusted R-squared: 0.1197
F-statistic: 2.359 on 1 and 9 DF, p-value: 0.1589

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Interpretation
From analysis of facts it can be observed that value of level of significance is 0.11>0.05 which means that there is no
significant mean difference between mean values of the dependent and independent variables overall death rate of nations and death
rate of the nation’s due to low immunization power. Multiple R square value is 0.20 which reflect that with change in independent
variable 20% variation comes in dependent variable which is overall death rate. It can be said that with change in the dependent
variable big variation does not comes in the independent variable. Value of adjusted R square is 11% which means that with addition
of new variable in the model 11% variation can be observed.

H0: There is no significant mean difference between death rate and mortality by road injury.
H1: There is significant mean difference between death rate and mortality by road injury.
> modl<-lm(ssp$Death.rate~ssp$Mortality.by.road.injury)
> summary(modl)
Call:
lm(formula = ssp$Death.rate ~ ssp$Mortality.by.road.injury)
Residuals:
Min 1Q Median 3Q Max
-2.6723 -0.4176 -0.2727 0.5491 2.8818
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.44931 0.96028 7.757 2.83e-05 ***
ssp$Mortality.by.road.injury -0.04918 0.05818 -0.845 0.42
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.56 on 9 degrees of freedom
Multiple R-squared: 0.07354, Adjusted R-squared: -0.0294

F-statistic: 0.7144 on 1 and 9 DF, p-value: 0.4199
Interpretation
Value of lervel of significence is 0.41>0.05 and this is reflecting that there is no significent difference between variables death
rate and mortality by road injury. It can be observed that value of R square is 0.07 which means due to change in independent variable
which is road injury 7% variation comes in the dependent variable which is death rate. It can be said that vaccinatios received by small
childrens and injury that casued to the people in road accident does not put any major impact on the death rate. This revealed that apart
from these factors immunization and injury due to road accidents there are many other factors that contribute to death rate of the
nations.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

5Conclusion
On the basis of above discussion it is concluded that pattern of death, immunization and injury due to road accidents are
similar across three groups of nations like Japan, Singapore, Fiji and Singapore. In second cluster there are three nations like
Myanmar, Cambodia and Indonesia and nations China, Mongolia, Malaysia and Vietnam are in third cluster. It is also concluded that
apart from road accident injury and lack of immunization vaccination there are number of factors in the nation that contribute to the
death rate elevation across the analyzed nations.
6Reflection
It was great experience to do data analysis in R by doing coding. There are some issues that I faced while doing coding in R
software. One of the main issue was that when coding was done on K means clustering method only two variables were taken in to
consideration and grouping of nations was not done properly. Mentioned method does not reflect overall scenario of these variables
across the nations in group’s altogether. Hence, in order to sort out this issue hclust method is used and under this grouping of nations
is done by taking in to account all relevant variables. Exercise that I do in clustering help me in understanding one thing which is that
k means can be used for analyzing two variables on specific parameter but hclust method can take in to account multiple factors and
can do grouping of nations accordingly. On application of linear regression model and descriptive statistics no issue was faced. In this
way good experience was obtained through this project.

REFERENCES
Books and journals
Tan, P.N., Steinbach, M. and Kumar, V., 2013. Data mining cluster analysis: basic concepts and algorithms. Introduction to data
mining.
Niknam, T. and Amiri, B., 2010. An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft
Computing, 10(1), pp.183-197.
Online
7 types of regression techniques you must know, 2017. [Online]. Available through :<
https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/>. [Accessed on 20th June 2017].