logo

Data Analysis Report of Causes of Death in Queensland

   

Added on  2023-03-31

13 Pages2299 Words177 Views
Data Analysis report of Causes of Death in Queensland
Name of the Student:
Name of the University:
Author Note:
Data Analysis Report of Causes of Death in Queensland_1
1
Author Name
Data analysis report of Causes of death in queensland
Table of Contents
Introduction...............................................................................................................................2
Data setup..................................................................................................................................2
Explanatory Data Analysis..........................................................................................................2
One Variable Analysis.............................................................................................................2
Two Variable Analysis............................................................................................................4
Advanced Analysis......................................................................................................................5
K-means Cluster Analysis.......................................................................................................5
Linear regression Analysis......................................................................................................7
Conclusion................................................................................................................................11
Reflections................................................................................................................................11
Reference and Bibliography.....................................................................................................12
Data Analysis Report of Causes of Death in Queensland_2
2
Author Name
Data analysis report of Causes of death in queensland
Introduction
This report presents the analysis of deaths due to various reasons from 1997 to 2017
in Queensland. The report tries to draw significant conclusions that can be useful for
practical life and further studies. The paper deals with a k means cluster analysis which
makes sub groups of year to present significant differences among average death rates due
to several reasons. The linear regressions shows how the number of death is changing over
the year. The data is collected from the Australian government website to analyse the
reasons of death in Queensland. The data cleaning is completed in Excel 2014 and the
analysis is done with help of open source statistical tool pack R.
Data setup
The data was not prepared for the analysis so some changes were made using excel.
In this stage the name of the variables are edited, the transpose of the data set is taken for
the analysis to describe the variables across year. After all these steps the excel file was
imported in R for the further analysis. The following codes were used accordingly.
ucdq<- readxl::read_xlsx(file.choose()) # import and read the excel file in R
The following codes are to upload the library which were used for the analysis.
library(RColorBrewer)
library(ggplot2)
library(stats4)
library(cluster)
Now, to omit the missing variables na.omit function is used some them are
mentioned below and thus the data is prepared for the further analysis (Little and Rubin
2019).
na.omit(ucdq$`Cause of death`)
na.omit(ucdq$`Certain infectious and parasitic diseases`)
na.omit(ucdq$`Neoplasms (cancer)`)
na.omit(ucdq$`Trachea, bronchus and lung`)
na.omit(ucdq$`Melanoma of skin`)
na.omit(ucdq$Breast)
Explanatory Data Analysis
One Variable Analysis
The variables contains information about an observation. In this section, diseases of
the nervous system and mental and behavioural disorders are chosen for one variable
analysis. R code for summary statistics and boxplot of disease of the nervous system is
mentioned below:
summary(ucdq$`Diseases of the nervous system`) #summary statistics
boxplot(ucdq$`Diseases of the nervous system`, col = “red”) #boxplot
Table 1: Summary statistics of deaths due to diseases of the nervous system
Data Analysis Report of Causes of Death in Queensland_3
3
Author Name
Data analysis report of Causes of death in queensland
The table 1 describes the minimum and maximum value of diseases of the nervous
system. The mean of variable 988.3, the 1st quartile 697.0 and 2nd quartile 1227. The boxplot
also shows the outliers. However, the boxplot in figure 1 does not show any outlier. This
simply implies that on an average 988 people die every year due to diseases of the nervous
system.
Figure 1: Box plot for deaths due to diseases of the nervous system
R code for summary statistics and boxplot of mental and behavioural disorders is
mentioned below:
summary(ucdq$`Mental and behavioural disorders`) #Summary statistics
boxplot(ucdq$`Mental and behavioural disorders`,col = "Green") #Boxplot
Table 2: Summary statistics of deaths due to mental and behavioural disorders
The table 2 describes the median and mean value of mental and behavioural
disorders. The mean of variable is 951, the 1st quartile 468 and 2nd quartile 1342. The boxplot
also shows the outliers. However, the boxplot in figure 2 does not show any outlier. This
simply implies that on an average 951 people die every year due to mental and behavioural
disorders (Brastein et al. 2018).
Data Analysis Report of Causes of Death in Queensland_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analysis report of Road Crashes
|13
|2045
|350

Introduction to Data Science: Analysis of Crash Trends in Australia
|16
|2780
|274

Maternal Health in Australia: Risk Factors and Analysis
|12
|1883
|486

Data Analysis Report of Fatalities in Australia
|16
|1986
|225

Analysis of Fatalities in Australia
|12
|1650
|53

Introduction to Data Science
|11
|3150
|22