Statistical Modelling and Analysis of NSW Public Transportation

Verified

Added on  2023/06/04

|9
|1985
|389
Report
AI Summary
This report provides a comprehensive statistical analysis of the New South Wales (NSW) public transportation system, utilizing data from the NSW government and applying statistical modeling techniques. The analysis focuses on various aspects of the transportation system, including the usage of different modes of transport (bus, train, light rail, and ferry), opal tap on and tap off data, and overall public transport usage patterns. The report evaluates the hypothesis related to public transport usage and includes a discussion on the limitations of the data and potential biases. Key findings indicate that bus and train are the most preferred modes of transport, and recommendations are made to the NSW government for improving the efficiency and frequency of these services. The report also emphasizes the importance of using larger sample populations in future research to avoid biased results. References to relevant research articles and statistical resources are included to support the analysis and conclusions.
Document Page
Running head: STATISTICS 1
1)
Introduction:
a)
The main objective of the assignment is to test skill in examining the data from the dataset
provided by our lecturer and the data by me. This assignment is all about using statistical

modelling techniques learned through the trimester to develop our knowledge on solving

particular business problems. In this report various aspects of New South Wales government

public transportation system have been evaluated after applying relevant statistical theories

and concepts. It doesn’t only involve hypothetical tests but also check the conditions to

validate its conclusion. NSW government provides various modes of transportation including

bus, train, light rail, ferries etc. We have been allocated data base on the same obtained

from New South Wales official site to analyse various factors. Public transport is the most

significant services to be provided by the NSW government for the smooth and effective

communication of people (Ben Barnes, 2013). However, every government have to be

careful enough to improve the services quality even better. To provide the better services,

efficient revenue generation is important. In this assignment we are going to focus on New

South Wales transportation system to solve specific business problem including analysis of

opal tap on and tap off, total usage of public transportation, whether New South Wales

government think of developing underground subway between train stations, which mode

of transportation generates the most revenue for the New South Wales government.

b)
Data set 1 is not an original data because it is the subset of sample data file from New South
Wales transport. The dataset 1 is considered as a secondary data because it is extracted

from the original data for the research purpose.
Dataset 1is a secondary form dataset since
it originates from the New South Wales master plan. Dataset 1 contains information that is

related to the New South Wales transport preferred by people of New South Wales. The

dataset is based on the New South Wales Long Term Transport Master Plan of December,

2012. According to the dataset, the New South Wales public transport is made up of four

basic modes of transport such as by bus, by train, by ferry and by light rail. The so presented

dataset also comprises of these variables (mode of transport, date, and tap, time of travel,

location and count). The date of transportation is available on the dataset as it gives the day

date of the travel by the people of New South Wales. The date presented is between 8
th to
14
th of August 2016. The variable “times” is as well indicated in the data. Time as a variable
will be important in the analysis since it enables travellers to plan for their journey.

Furthermore, dataset 1 is comprises of 1000 samples. The data Dataset 1 is thus a secondary

dataset since it includes information collected by the government and the data was initially

collected for other related research work. The dataset contains variables such as mode of

transport, gender, time of the tap (on or off), location and count. The possible cases applied

in the study are observation and interviews. This is because the actual participants were

involved in the survey and it therefore implies that they were either subjected to interviews

or were given some questionnaires to fill. Observation was the key research case that was

possibly applied since the survey required much attention in getting and recording to some

considerable and important aspects.

c)
Dataset 2: Dataset is a dataset that comprises of only two variables i.e. mode of
transportation and the gender. Gender in this dataset represents the demographic aspect.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
STATISTICS 2
The modes of transport in dataset 2 are four. They include transport by bus, by ferry, by rain

and by light train. The dataset 2 has comprise of a sample of 25 from which 14 are females

while 11 are male. Dataset is a primary dataset as the data was collected from the actual

traveller from the New South Wales. The possible cases applied in collecting dataset 2 was

through observation. The researcher (in this case “me”) conducted actual study by observing

the factors under consideration and recording. However, on critical examination of dataset

2, dataset 2 is biased due to the following reasons;

i.
The total sample presented in the dataset is comprises of only 25 cases which is
relatively smaller and thus could not be used up in the analysis. The minimum

sample should be 3o cases/ items.

ii.
The dataset only comprised of categorical variables which cannot be subjected to
more statistical analysis since only demographical aspects/ variables i.e. gender and

mode of transport is presented in the dataset.

Section 2(a)

Variable mode is one categorical data. As it is one categorical variable we can use only one numerical

summary.

Numerical summary

So with the table of numerical summary, it is evident that bus has the highest proportion of 0.483.

Graphical Summary

Pie chart

Row Labels
Count of mode proportion
Bus
483 0.483
Ferry
38 0.038
light rail
16 0.016
Train
463 0.463
Grand Total
1000 1
Document Page
STATISTICS 3
48%

4%

2%

46%

Total NSW people using public transport dur-
ing 8th to 14th august,2016.

bus

ferry

lightrail

train

From the above pie chart it is clearer that highest number of the people which is 48% of NSW is

using bus to transport. After that 2
nd highest number of people prefer train to travel with the
percentage of 46. Ferry and light rail has the least which is 4% and 2% of people who prefer to travel

by ferry and bus.

Section 2b:

To answer the hypothesis, we have to follow 5 steps as given below:

Step 1.
Stating the hypotheses
H0: p=0.5

H1: p>0.5

Step 2.
Checking if condition is satisfied
Is condition satisfied?

np0
10 = (1000*0.5) =50010
n (1-p)
10=1000(1-0.5) =50010
As 500 is greater than 10, thus the conditions have been satisfied. Therefore, p-value can be

computed as the area in tail(s) of a standard normal beyond z.

Step 3:
Computing the test statistics
Statistic test
Document Page
STATISTICS 4
Test-stat=
statisticnull
SE

Z=

phat p
p(1 p )
n

=

0.4830.5
0.5(10.5 )
1000

=
0.017
0.0158
= -1.08
Step 4:
Comparison
P-value=p (z>-1.08)

=0.1357

Step 5:
Conclusion
As p-value (0.1643)>0.05 (assumed alpha
α), we do not reject null hypothesis H0.
Test stat (1.08) is less than 1.645, so do not reject H0, hence there is no significant evidence.

Section 3a

Variable location is categorical and variable count is quantitative. This is one categorical and one

quantitative so we have to use box plot for the graphical representation. So here the numerical

summary is given below.

Numerical Summary

This is the numerical summary where the statistics of three different stations is given which is

Parramatta station, Gosford station and Blacktown station. Overall sample size of all station is 15.

Parramatta station has highest mean, Standard deviation, median Q1 and Q3 and so on.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
STATISTICS 5
Box plot

Conclusion

3b

Step-1

H0: all mean are equal

H1: at least two means are different

Step-2

Condition check

1.
All sample size are 30 (satisfied)
2.
All standard deviation as similar (Yes)
Document Page
STATISTICS 6
Step-3
Document Page
STATISTICS 7
4a

Numerical summary
.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
STATISTICS 8
Discussion and Conclusion

Based the results from different analysis of various variables from the sections above, the following

can be discussed and summarized as main findings. Consequently, few recommendations to the New

South Wales government can be made in reference to some of the findings from the analyses.

From the section one above, it is clear that the bus and train are the main modes of

transport preferred by people of New South Wales. The two modes are more frequent by both

gender. On the other hand, travel by light rail was the least mode with low count from both gender.

Hence I would recommend to the government of New South Wales transport to consider improving

efficiency on both bus and train transport system by increasing frequency of the travel or by

extending the system.

Similarly, the male prefer to use train more than the female. In the case of ferry, female

prefer to use ferry more than the male. Similarly, male like to use bus as modes of transport than

that of female. Whereas female use light rail more than male. Based on the analysis of the

hypotheses in sections 2, it can be concluded that there is no mode of public transport that compose

about half (50%) of the available transport modes in New South Wales.

The findings further suggests that both the gender prefer the two modes than the rest of the

available transport. However, the train and the bus compose of about 48% and 47% of the total New

South Wales transport and thus implies the most preferable means of transport by people. Most of

males are found to prefer use of train as a mode of transport than female. In the case of using a

ferry, the female are the frequent users than males. Whe
reas, the female also prefer light rail than
male. I would also recommend to the New South Wales transport to consider investing much in

trains and buses
Document Page
STATISTICS 9
Moreover, I would recommend that a larger sample population be applied in future research work

as small sample of the population tend to yield produce biased information which definitely leads to

wrong assertions and conclusion about particular aspect under consideration.

References

Ben, B. (2013).
New South Wales Centre for Road Safety. Transport for NSW. Conference: Intelligent
Vehicles Symposium (IV). Retrieved from
https://www.researchgate.com
Bruce, P. C. (2014).
Introductory Statistics and Analysis [e-book]. New Jersey: John Wiley &
Sons.

Diggie, P. J. (2015).Statistics: A data Science for the 21
st Century. Journal of the Royal
Statistical Society.
Retrieved from http://moodle.koi.edu.au
Garry. B (2018). NSW Long Term Transport Masterplan. Smart Infrastructure Facility:

University of Wollonggong. Retrieved from
http://www.transport.nsw.gov.au
Jarman, K. H (2015). Beyond Basic Statistics [e-book]. New Jersey: John Wiley and Sons.

Hanne, R. A.M., & Kposowa, A.J., & Riddle, M. D (2013). Basic Statistics for Social Science.

San Francisco: Jossey-Bass (Wiley).

Lock, Robin H., Lock, Patti Frazer, Morgan, Karl Lock, Erick F., & Dennis F. (2013).
Statistics.
Unlocking the Power of Data. Wiley & Sons.
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]