logo

Introduction to Data Science: Analysis of Crash Trends in Australia

Data analysis report of the health and population statistics of East Asian and Pacific countries

16 Pages2780 Words274 Views
   

Added on  2022-11-13

About This Document

This report presents the trends of crash across age and gender within the country Australia from 1989 to 2019. Several analysis like one variable analysis, two variable analysis and linear regression is run to find the relation between attributes and the crashes.

Introduction to Data Science: Analysis of Crash Trends in Australia

Data analysis report of the health and population statistics of East Asian and Pacific countries

   Added on 2022-11-13

ShareRelated Documents
Introduction to Data Science
Name of the Student:
Name of the University:
Author Note:
Introduction to Data Science: Analysis of Crash Trends in Australia_1
INTRODUCTION TO DATA SCIENCE
Table of Contents
Introduction.................................................................................................2
Purpose....................................................................................................2
Limitation.................................................................................................2
Scope.......................................................................................................2
Methodology.............................................................................................2
Data setup...................................................................................................2
Explanatory Data Analysis...........................................................................3
One variable Analysis 1............................................................................3
One variable Analysis 2............................................................................5
Two Variable Analysis..............................................................................7
Two variable analysis 1.........................................................................7
Two variable analysis 2.........................................................................8
Advanced Analysis......................................................................................9
Clustering.................................................................................................9
Concept of k-means Cluster Analysis....................................................9
Clustering Analysis................................................................................9
Linear regression Analysis......................................................................10
Conclusion.................................................................................................11
Reflections.................................................................................................12
Reference and Bibliography:.....................................................................13
AUTHOR NAME 1
Introduction to Data Science: Analysis of Crash Trends in Australia_2
A <- readxl::read_xlsx(file.choose())
library(ggplot2)
INTRODUCTION TO DATA SCIENCE
Introduction
Purpose
This report presents the trends of crash across age and gender
within the country Australia from 1989 to 2019. Several analysis like one
variable analysis, two variable analysis and linear regression is run to find
the relation between attributes and the crashes. However, the specific
goal is not set to analyse the data and so, the report tries to present a
significant results that can predict the crash across gender, age and speed
limit.
Limitation
The data contains large number of observations with a large number
if missing values which is not good for an analysis. However the missing
values are removed. A logistic regression can say the probability of type
of crashes which is not present in this paper.
Scope
There is 2 one variable analysis and 2 two variable analysis is
presented which explains those variables. Moreover, this paper
incorporates only cluster analysis to group the significant variable and a
linear regression which is described and interpreted well. Both the models
are briefly discussed in this paper
Methodology
The data used in the analysis is collected form the Australian
government website which is named as Australian road deaths database.
The data is imported in the R which is a statistical analysis tool for data
analysis. Before importing to the statistical tool pack the data is processed
in Excel which is described in the below section called Data setup.
Data setup
Here the used data is processed in Excel 2016 where the first four
rows are removed from the data set which were not relevant to the study.
Now, the data file is ready to upload into the workspace of R. The
following command is used to upload the xlsx file into the work space in R.
The two libraries are loaded that are “ggplot2” and “Cluster) for
data visualization.
AUTHOR NAME 2
Introduction to Data Science: Analysis of Crash Trends in Australia_3
library(cluster)
AA<-subset(A, A$`Crash Type`!=-9)
na.omit(AA$`Bus Involvement`)
plot(AA$Age, AA$`Speed Limit`, main = "Speed Limit vs Age", xlab = "Age", ylab =
hist(AA$Age,col = rainbow(length(AA$Age)))
INTRODUCTION TO DATA SCIENCE
The downloaded data file contains the value -9 for missing and
unknown observation. Now the observations that contains the value -9 are
omitted by using the “Subset” function. An example is drawn below:
The above command says that the variable crash type which is not
equal to -9 will be assigned to AA data set. Thus the missing values are
also omitted from the data set and the command for the missing values is
presented below:
The command says that the missing values are omitted from the
variable Bus involvement from AA data set. Now the data set is clear and
the further analysis can be done by using the data.
Explanatory Data Analysis
One variable Analysis 1
Here the chosen variable is age of the individual. The figure-1
presents that the age is left skewed. The figure is generated by using the
“hist” function to generate the histogram (Glowacz and Glowacz 2016).
The used code gives the histogram for age where a rainbow function is
used to fill colour in the bars of histogram and the code is mentioned
below:
AUTHOR NAME 3
Introduction to Data Science: Analysis of Crash Trends in Australia_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analysis report of Road Crashes
|13
|2045
|350

Data Analysis Report of Causes of Death in Queensland
|13
|2299
|177

Introduction to Data Science
|11
|3150
|22

Data Analysis Report of Australian Road Transport Crash Fatalities
|11
|3093
|79

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|15
|2593
|166

Data Analysis Report of Health and Population Statistics of East Asian and Pacific Countries
|17
|3130
|360