This study analyzes the trends in fatalities in Australia based on states and gender. The findings are crucial in informing government and stakeholders on measures to reduce fatalities.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Analysis of fatalities in Australia Prepared by Firstname Lastname University of the Sunshine Coast Queensland May-June 2019
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1.Introduction 1.1Authorization and Purpose This study sought to analyze the trends in fatalities in Australia. The aim was to analyze and compare the trends based on the states as well as the gender. The question that this study seeks to answer is how does the trend in fatalities compare for the males and states? The findings of this study are crucial in informing the government as well as other stakeholders on how to come up with measures that would see the reduction in the fatalities. 1.2Limitations The main limitation of this study is the fact that the collected data is only for one country which is Australia. There is therefore no comparison that is being made to understand the trends in other countries. 1.3Scope Secondary data collected from the World bank database is utilized for this study. Beforeanalysisisdonethedatasetispre-processedreadyforanalysis.Both univariate and bivariate analysis are performed for this study. Other advanced analysis involving regression analysis as well as cluster analysis are also performed. 1.4Methodology The study involves analysis of panel data. The data spans from 2010 through to 2018 and involving six states in Australia as well as two territories. 2.Data setup The pre-processed data was loaded into R for analysis. The code used to load the data into R is given below;
The necessary packages such as cluster package were loaded into R. The code is presented below; 3.Exploratory Data analysis 3.1One variable analysis 3.1.1One variable analysis 1 The codes are presented below; Inthis section, we present the summary statistics for Age. As can be seen, the average age of the victims was found to be 43.77 with a median age being 41.00. Since the median and the mean age are close to each other we can say that the distribution is close to normal distribution. This is confirmed form the boxplot given below; Figure1: Box plot of speed limit fatalities<-read.csv("C:\\ Users\\Documents\\ fatalities.csv") install.packages("cluster") library(cluster) summary(fatalities$Age) boxplot(fatalities$Age, ylab="Age", main="Boxplot of age", col=" chartreuse1 ")
3.1.2One variable analysis 2 In this section, we present the frequency distribution of the speed limit using a histogram as well as a summary statistics for the variable speed limit. The code used to generate the results is given below; The average speed limit is 83.17 with the highest speed limit being 130 and the median speed is 80.00. Figure2: Histogram for speed limit As can be seen from the histogram above (figure 2) majority of the fatalities came from speed limits between 90-100. Speed limits of 50 to 60 also had a substantial fatalities of more than 2000 cases. 3.2Two-variable analysis 3.2.1Two-variable analysis 1 > summary(fatalities$Spee d.Limit) Min. 1st Qu.Median Mean 3rd Qu.Max. 15.0060.0080.00 83.17100.00130.00
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
In this section, we sought to test whether there exists significant difference in the average speed limits for the male and the female drivers involved in fatalities. The R code is given as follows; In this section, we present the relationship between age and the speed limit. A scatter plot is the most ideal plot that helps visualize the relationship between two variables. As can be seen from the above results, the average speed limit for the male victims was 82.66 while the speed limit for the female victims was 84.50. The p-value of the t-test was 0.000 (a value less than 5% level of significance), we therefore reject the null hypothesis and conclude that there is significant difference in the average speed limit for the male and female victims. The female victims had a significantly higher speed limit as compared to the male victims. t.test(Speed.Limit~Gender) > t.test(Speed.Limit~Gend er) Welch Two Sample t-test data:Speed.Limit by Gender t = 4.0029, df = 5462.1, p-value = 6.341e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.9376411 2.7375703 sample estimates: mean in group Female mean in group Male 84.49802 82.66041
Figure3: A scatter plot of speed limit versus age 3.2.2Two-variable analysis 1 In this section, we sought to test whether there exists significant difference in the average speed limits for the different states. The R code is given as follows; The results are presented below; The above results shows that the speed limits significantly vary across the states with NT having the highest speed limit fatalities while ACT had the lowest speed limit fatalities. model1<- lm(Speed.Limit~State) model1 summary(model1) anova(model1) > anova(model1) Analysis of Variance Table Response: Speed.Limit DfSum Sq Mean Sq F valuePr(>F) State7124900 17842.839.405 < 2.2e- 16 *** Residuals 11015 4987716452.8 --- Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4.Advanced analysis 4.1Clustering 4.1.1Brief explanation of k-means and clustering In the section we present the clustering analysis performed on the data(Filipovych, et al., 2011). Cluster analysis refers to grouping of data that are of similar attributes(Frey & Dueck, 2017). 4.1.2Clustering Analysis The R code for this section is given as; We performed a cluster analysis and results showed that a significant relationship exists in the speed limits and the state. The data was grouped in three different clusters based in the states. mydata <- na.omit(fatalities) mydata <- scale(fatalities) fit <- kmeans(mydata, 3)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
4.2Linear regression 4.2.1Brief definition of linear regression In this section, we present regression analysis for predicting the speed limit(Aldrich, 2015).Regression analysis is a technique that tries to show a relationship that exists between a dependent variable and one or more independent (exploratory) variables (Tofallis, 2009).The simple linear regression equation is of the form; Y=β0+β1X WhereYis the dependent (response) variable,β0is the intercept coefficient,β1is the coefficient of X and last X is the independent (exploratory) variable(Mahdavi Damghani, 2012). 4.2.2Linear Regression 1 In this section, we analyze the relationship between speed limit and state(Nikolić, et al., 2012). The state in this case is a dummy variable(Székely & Rizzo, 2009). The R codeis given below;model1<- lm(Speed.Limit~State) model1 summary(model1) > summary(model1) Call: lm(formula = Speed.Limit ~ State) Residuals: Min1QMedian 3QMax -76.301 -22.139-0.539 17.86133.699 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)72.903 2.20733.039< 2e-16 *** StateNSW7.636 2.2383.412 0.000647 ***
The results of the analysis presented above shows that the model is fit to predict the speed limit based on the dummy variable state as the independent variable. The value of R- squared is given as 0.0244; this implies that 2.44% of the variation in the dependent variable (speed limit) is explained by the dummy variable state in the model. 4.2.3Linear Regression 2 In this section, we sought to test whether there is significant relationship between speed limit and Christmas period. The R code for this section is given as follows; The results of the analysis are presented below; The results of the analysis presented above shows that the model is fit to predict the speed limit based on the dummy variable state as the independent variable.ThevalueofR-squaredisgivenas 0.00058; this implies that 0.058% of the variation in the dependent variable (speed limit) is explained by the dummy variable Christmas period in the model. > summary(model2) Call: lm(formula = Speed.Limit ~ Christmas.Period) Residuals: Min1QMedian 3QMax -68.071 -23.071-3.071 16.92946.929 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)83.0706 0.2085 398.496<2e-16 *** Christmas.PeriodYes 2.94641.16322.533 0.0113 * --- Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 21.53 on 11021 degrees of freedom Multiple R-squared: model2<- lm(Speed.Limit~Christma s.Period) model2 summary(model2)
5.Conclusion This study aimed at analyzing the trends in the fatalities in Australia. Results showed that significant differences exist in the fatalities in the different states and territories within Australia as well between the different gender groups. Results showed that female victims tend to over-speed more as compared to the male victims. 6.Reflection The study utilized various statistical techniques learnt in class to draw conclusions about the data. Several statistical as well as machine learning algorithms were utilized. The study shows the great importance of these techniques in helping to create meaningful decisions from the dataset. References Aldrich, J., 2015. Fisher and Regression.Statistical Science,20(4), p. 401–417. Filipovych, R., Resnick, S. M. & Davatzikos, C., 2011. Semi-supervised Cluster Analysis of Imaging Data. Journal of Neuro Image,54(3), p. 2185–2197. Frey, B. J. & Dueck, D., 2017. Clustering by Passing Messages Between Data Points.Journal of Science, 315 (5814), p. 972–976. Mahdavi Damghani, B., 2012. The Misleading Value of Measured Correlation.Wilmott,1(6), p. 64–73. Meilă, M., 2013. Comparing Clusterings by the Variation of Information: Learning Theory and Kernel Machines.Lecture Notes in Computer Science,Volume 2777, p. 173–187. Nikolić, D., Muresan, R., Feng, W. & Singer, W., 2012. Scaled correlation analysis: a better way to compute a cross-correlogram.European Journal of Neuroscience,35(5), p. 1–21. Székely, G. J. & Rizzo, M. L., 2009. Brownian distance covariance.Annals of Applied Statistics,3(4), p. 1233–1303. Tofallis, C., 2009. Least Squares Percentage Regression.Journal of Modern Applied Statistical Methods, 7(5), p. 526–534.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser