logo

Data Analysis report of Road Crashes

   

Added on  2023-03-31

13 Pages2045 Words350 Views
Data Analysis report of Road Crashes from 1989 to 2019
Name of the Student:
Name of the University:
Author Note:
Data Analysis report of Road Crashes_1
Table of Contents
Introduction...............................................................................................................................2
Data setup..................................................................................................................................2
Explanatory Data Analysis..........................................................................................................2
One variable Analysis “AGE”..................................................................................................2
One variable Analysis “SPEED LIMIT”.....................................................................................4
Two Variable Analysis............................................................................................................5
Two variable analysis “GENDER & CRASH TYPE”...............................................................5
Two variable analysis “CRASH TYPE & ROAD USER”..........................................................5
Advanced Analysis......................................................................................................................6
Clustering...............................................................................................................................6
Concept of k-means Cluster Analysis.................................................................................6
Clustering Analysis.............................................................................................................6
Linear regression Analysis......................................................................................................9
Conclusion................................................................................................................................10
Reflections................................................................................................................................10
Reference and Bibliography:....................................................................................................12
Page 1
Data analysis report of road crashes from 1989 to 2019
Data Analysis report of Road Crashes_2
Introduction
The report presents the analysis of road accidents form 1989 to 2019. There are
techniques like cluster analysis and linear regression analysis which are used here. The data
contains mostly qualitative data and these can be used for further advance studies like
prediction of speed limit that encourages multiple crashes (Cox 2018). The paper describes
the data using one variable analysis and two variable analysis. The data is available on the
Australian government website. Excel is used to clean and R is used to perform the analysis.
Data setup
The data file used in the analysis was copied in a new excel workbook and saved as
xlsx file. The data was then edited by using EXCEL 2013 where the observations were
deleted which contains the value -9 as it indicates missing value and unknown information.
After cleaning the data xlsx file was uploaded in the R workspace for further steps towards
analysis. The command used to upload the file is mentioned below:
Uploading data file: crash <- readxl::read_xlsx(file.choose())
There are three libraries that were loaded for the analysis. The libraries are “stats”,
“RColorBrewer” and “ggplot2” for statistical functions and data visualisation.
library(stats)
library(RColorBrewer)
library(ggplot2)
Before going to the analysis part, the initial step was to remove the missing values
from the data set which were automatically assigned as NA. The following command was
used to omit the NA values from the data set.
na.omit(mydata$`Crash Type`)
na.omit(mydata$`Bus Involvement`)
na.omit(mydata$`Articulated Truck Involvement`)
na.omit(mydata$`Speed Limit`)
na.omit(mydata$`Road User`)
na.omit(mydata$Gender)
na.omit(mydata$Age)
The above commands ensures that the further steps were not going to be disturbed
by the missing values. Hence, the analysis proceeded for the one variable analysis and so on.
Explanatory Data Analysis
One variable Analysis “AGE”
summary(mydata$Age)
boxplot(mydata$Age,col = "blue")
Page 2
Data analysis report of road crashes from 1989 to 2019
Data Analysis report of Road Crashes_3
Figure 1: Histogram of Age
The figure 1 presents the frequency distribution of age and the shape of distribution
which is left skewed. The table 1 shows the average age of the observed sample which is 41.
The quartiles are presented by the box plot too.
Table 1: Summary statistics of age
Figure 2: Box plot of age
The figure 2 and table 1 presents the lower upper quartile which is 23 and 57
respectively. The additional feature of the box plot is showing the outliers which is not
present in the age variable (Wickham 2016).
Page 3
Data analysis report of road crashes from 1989 to 2019
Data Analysis report of Road Crashes_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Introduction to Data Science: Analysis of Crash Trends in Australia
|16
|2780
|274

Data Analysis Report of Causes of Death in Queensland
|13
|2299
|177

Data Analysis Report of Australian Road Transport Crash Fatalities
|11
|3093
|79

Introduction to Data Science
|11
|3150
|22

Data Analysis Report of Fatalities in Australia
|16
|1986
|225