Statistical Modelling and Analysis of NSW Public Transportation
VerifiedAdded on 2023/06/04
|9
|1985
|389
Report
AI Summary
This report provides a comprehensive statistical analysis of the New South Wales (NSW) public transportation system, utilizing data from the NSW government and applying statistical modeling techniques. The analysis focuses on various aspects of the transportation system, including the usage of different modes of transport (bus, train, light rail, and ferry), opal tap on and tap off data, and overall public transport usage patterns. The report evaluates the hypothesis related to public transport usage and includes a discussion on the limitations of the data and potential biases. Key findings indicate that bus and train are the most preferred modes of transport, and recommendations are made to the NSW government for improving the efficiency and frequency of these services. The report also emphasizes the importance of using larger sample populations in future research to avoid biased results. References to relevant research articles and statistical resources are included to support the analysis and conclusions.

Running head: STATISTICS 1
1) Introduction:
a) The main objective of the assignment is to test skill in examining the data from the dataset
provided by our lecturer and the data by me. This assignment is all about using statistical
modelling techniques learned through the trimester to develop our knowledge on solving
particular business problems. In this report various aspects of New South Wales government
public transportation system have been evaluated after applying relevant statistical theories
and concepts. It doesn’t only involve hypothetical tests but also check the conditions to
validate its conclusion. NSW government provides various modes of transportation including
bus, train, light rail, ferries etc. We have been allocated data base on the same obtained
from New South Wales official site to analyse various factors. Public transport is the most
significant services to be provided by the NSW government for the smooth and effective
communication of people (Ben Barnes, 2013). However, every government have to be
careful enough to improve the services quality even better. To provide the better services,
efficient revenue generation is important. In this assignment we are going to focus on New
South Wales transportation system to solve specific business problem including analysis of
opal tap on and tap off, total usage of public transportation, whether New South Wales
government think of developing underground subway between train stations, which mode
of transportation generates the most revenue for the New South Wales government.
b) Data set 1 is not an original data because it is the subset of sample data file from New South
Wales transport. The dataset 1 is considered as a secondary data because it is extracted
from the original data for the research purpose. Dataset 1is a secondary form dataset since
it originates from the New South Wales master plan. Dataset 1 contains information that is
related to the New South Wales transport preferred by people of New South Wales. The
dataset is based on the New South Wales Long Term Transport Master Plan of December,
2012. According to the dataset, the New South Wales public transport is made up of four
basic modes of transport such as by bus, by train, by ferry and by light rail. The so presented
dataset also comprises of these variables (mode of transport, date, and tap, time of travel,
location and count). The date of transportation is available on the dataset as it gives the day
date of the travel by the people of New South Wales. The date presented is between 8th to
14th of August 2016. The variable “times” is as well indicated in the data. Time as a variable
will be important in the analysis since it enables travellers to plan for their journey.
Furthermore, dataset 1 is comprises of 1000 samples. The data Dataset 1 is thus a secondary
dataset since it includes information collected by the government and the data was initially
collected for other related research work. The dataset contains variables such as mode of
transport, gender, time of the tap (on or off), location and count. The possible cases applied
in the study are observation and interviews. This is because the actual participants were
involved in the survey and it therefore implies that they were either subjected to interviews
or were given some questionnaires to fill. Observation was the key research case that was
possibly applied since the survey required much attention in getting and recording to some
considerable and important aspects.
c) Dataset 2: Dataset is a dataset that comprises of only two variables i.e. mode of
transportation and the gender. Gender in this dataset represents the demographic aspect.
1) Introduction:
a) The main objective of the assignment is to test skill in examining the data from the dataset
provided by our lecturer and the data by me. This assignment is all about using statistical
modelling techniques learned through the trimester to develop our knowledge on solving
particular business problems. In this report various aspects of New South Wales government
public transportation system have been evaluated after applying relevant statistical theories
and concepts. It doesn’t only involve hypothetical tests but also check the conditions to
validate its conclusion. NSW government provides various modes of transportation including
bus, train, light rail, ferries etc. We have been allocated data base on the same obtained
from New South Wales official site to analyse various factors. Public transport is the most
significant services to be provided by the NSW government for the smooth and effective
communication of people (Ben Barnes, 2013). However, every government have to be
careful enough to improve the services quality even better. To provide the better services,
efficient revenue generation is important. In this assignment we are going to focus on New
South Wales transportation system to solve specific business problem including analysis of
opal tap on and tap off, total usage of public transportation, whether New South Wales
government think of developing underground subway between train stations, which mode
of transportation generates the most revenue for the New South Wales government.
b) Data set 1 is not an original data because it is the subset of sample data file from New South
Wales transport. The dataset 1 is considered as a secondary data because it is extracted
from the original data for the research purpose. Dataset 1is a secondary form dataset since
it originates from the New South Wales master plan. Dataset 1 contains information that is
related to the New South Wales transport preferred by people of New South Wales. The
dataset is based on the New South Wales Long Term Transport Master Plan of December,
2012. According to the dataset, the New South Wales public transport is made up of four
basic modes of transport such as by bus, by train, by ferry and by light rail. The so presented
dataset also comprises of these variables (mode of transport, date, and tap, time of travel,
location and count). The date of transportation is available on the dataset as it gives the day
date of the travel by the people of New South Wales. The date presented is between 8th to
14th of August 2016. The variable “times” is as well indicated in the data. Time as a variable
will be important in the analysis since it enables travellers to plan for their journey.
Furthermore, dataset 1 is comprises of 1000 samples. The data Dataset 1 is thus a secondary
dataset since it includes information collected by the government and the data was initially
collected for other related research work. The dataset contains variables such as mode of
transport, gender, time of the tap (on or off), location and count. The possible cases applied
in the study are observation and interviews. This is because the actual participants were
involved in the survey and it therefore implies that they were either subjected to interviews
or were given some questionnaires to fill. Observation was the key research case that was
possibly applied since the survey required much attention in getting and recording to some
considerable and important aspects.
c) Dataset 2: Dataset is a dataset that comprises of only two variables i.e. mode of
transportation and the gender. Gender in this dataset represents the demographic aspect.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS 2
The modes of transport in dataset 2 are four. They include transport by bus, by ferry, by rain
and by light train. The dataset 2 has comprise of a sample of 25 from which 14 are females
while 11 are male. Dataset is a primary dataset as the data was collected from the actual
traveller from the New South Wales. The possible cases applied in collecting dataset 2 was
through observation. The researcher (in this case “me”) conducted actual study by observing
the factors under consideration and recording. However, on critical examination of dataset
2, dataset 2 is biased due to the following reasons;
i. The total sample presented in the dataset is comprises of only 25 cases which is
relatively smaller and thus could not be used up in the analysis. The minimum
sample should be 3o cases/ items.
ii. The dataset only comprised of categorical variables which cannot be subjected to
more statistical analysis since only demographical aspects/ variables i.e. gender and
mode of transport is presented in the dataset.
Section 2(a)
Variable mode is one categorical data. As it is one categorical variable we can use only one numerical
summary.
Numerical summary
So with the table of numerical summary, it is evident that bus has the highest proportion of 0.483.
Graphical Summary
Pie chart
Row Labels Count of mode proportion
Bus 483 0.483
Ferry 38 0.038
light rail 16 0.016
Train 463 0.463
Grand Total 1000 1
The modes of transport in dataset 2 are four. They include transport by bus, by ferry, by rain
and by light train. The dataset 2 has comprise of a sample of 25 from which 14 are females
while 11 are male. Dataset is a primary dataset as the data was collected from the actual
traveller from the New South Wales. The possible cases applied in collecting dataset 2 was
through observation. The researcher (in this case “me”) conducted actual study by observing
the factors under consideration and recording. However, on critical examination of dataset
2, dataset 2 is biased due to the following reasons;
i. The total sample presented in the dataset is comprises of only 25 cases which is
relatively smaller and thus could not be used up in the analysis. The minimum
sample should be 3o cases/ items.
ii. The dataset only comprised of categorical variables which cannot be subjected to
more statistical analysis since only demographical aspects/ variables i.e. gender and
mode of transport is presented in the dataset.
Section 2(a)
Variable mode is one categorical data. As it is one categorical variable we can use only one numerical
summary.
Numerical summary
So with the table of numerical summary, it is evident that bus has the highest proportion of 0.483.
Graphical Summary
Pie chart
Row Labels Count of mode proportion
Bus 483 0.483
Ferry 38 0.038
light rail 16 0.016
Train 463 0.463
Grand Total 1000 1

STATISTICS 3
48%
4%
2%
46%
Total NSW people using public transport dur-
ing 8th to 14th august,2016.
bus
ferry
lightrail
train
From the above pie chart it is clearer that highest number of the people which is 48% of NSW is
using bus to transport. After that 2nd highest number of people prefer train to travel with the
percentage of 46. Ferry and light rail has the least which is 4% and 2% of people who prefer to travel
by ferry and bus.
Section 2b:
To answer the hypothesis, we have to follow 5 steps as given below:
Step 1. Stating the hypotheses
H0: p=0.5
H1: p>0.5
Step 2. Checking if condition is satisfied
Is condition satisfied?
np0≥10 = (1000*0.5) =500≥10
n (1-p)≥10=1000(1-0.5) =500≥10
As 500 is greater than 10, thus the conditions have been satisfied. Therefore, p-value can be
computed as the area in tail(s) of a standard normal beyond z.
Step 3: Computing the test statistics
Statistic test
48%
4%
2%
46%
Total NSW people using public transport dur-
ing 8th to 14th august,2016.
bus
ferry
lightrail
train
From the above pie chart it is clearer that highest number of the people which is 48% of NSW is
using bus to transport. After that 2nd highest number of people prefer train to travel with the
percentage of 46. Ferry and light rail has the least which is 4% and 2% of people who prefer to travel
by ferry and bus.
Section 2b:
To answer the hypothesis, we have to follow 5 steps as given below:
Step 1. Stating the hypotheses
H0: p=0.5
H1: p>0.5
Step 2. Checking if condition is satisfied
Is condition satisfied?
np0≥10 = (1000*0.5) =500≥10
n (1-p)≥10=1000(1-0.5) =500≥10
As 500 is greater than 10, thus the conditions have been satisfied. Therefore, p-value can be
computed as the area in tail(s) of a standard normal beyond z.
Step 3: Computing the test statistics
Statistic test
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

STATISTICS 4
Test-stat= statistic−null
SE
Z=
phat − p
√ p(1− p )
n
=
0.483−0.5
√ 0.5(1−0.5 )
1000
=−0.017
0.0158 = -1.08
Step 4: Comparison
P-value=p (z>-1.08)
=0.1357
Step 5: Conclusion
As p-value (0.1643)>0.05 (assumed alpha α), we do not reject null hypothesis H0.
Test stat (1.08) is less than 1.645, so do not reject H0, hence there is no significant evidence.
Section 3a
Variable location is categorical and variable count is quantitative. This is one categorical and one
quantitative so we have to use box plot for the graphical representation. So here the numerical
summary is given below.
Numerical Summary
This is the numerical summary where the statistics of three different stations is given which is
Parramatta station, Gosford station and Blacktown station. Overall sample size of all station is 15.
Parramatta station has highest mean, Standard deviation, median Q1 and Q3 and so on.
Test-stat= statistic−null
SE
Z=
phat − p
√ p(1− p )
n
=
0.483−0.5
√ 0.5(1−0.5 )
1000
=−0.017
0.0158 = -1.08
Step 4: Comparison
P-value=p (z>-1.08)
=0.1357
Step 5: Conclusion
As p-value (0.1643)>0.05 (assumed alpha α), we do not reject null hypothesis H0.
Test stat (1.08) is less than 1.645, so do not reject H0, hence there is no significant evidence.
Section 3a
Variable location is categorical and variable count is quantitative. This is one categorical and one
quantitative so we have to use box plot for the graphical representation. So here the numerical
summary is given below.
Numerical Summary
This is the numerical summary where the statistics of three different stations is given which is
Parramatta station, Gosford station and Blacktown station. Overall sample size of all station is 15.
Parramatta station has highest mean, Standard deviation, median Q1 and Q3 and so on.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS 5
Box plot
Conclusion
3b
Step-1
H0: all mean are equal
H1: at least two means are different
Step-2
Condition check
1. All sample size are ≥30 (satisfied)
2. All standard deviation as similar (Yes)
Box plot
Conclusion
3b
Step-1
H0: all mean are equal
H1: at least two means are different
Step-2
Condition check
1. All sample size are ≥30 (satisfied)
2. All standard deviation as similar (Yes)

STATISTICS 6
Step-3
Step-3
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

STATISTICS 7
4a
Numerical summary.
4a
Numerical summary.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS 8
Discussion and Conclusion
Based the results from different analysis of various variables from the sections above, the following
can be discussed and summarized as main findings. Consequently, few recommendations to the New
South Wales government can be made in reference to some of the findings from the analyses.
From the section one above, it is clear that the bus and train are the main modes of
transport preferred by people of New South Wales. The two modes are more frequent by both
gender. On the other hand, travel by light rail was the least mode with low count from both gender.
Hence I would recommend to the government of New South Wales transport to consider improving
efficiency on both bus and train transport system by increasing frequency of the travel or by
extending the system.
Similarly, the male prefer to use train more than the female. In the case of ferry, female
prefer to use ferry more than the male. Similarly, male like to use bus as modes of transport than
that of female. Whereas female use light rail more than male. Based on the analysis of the
hypotheses in sections 2, it can be concluded that there is no mode of public transport that compose
about half (50%) of the available transport modes in New South Wales.
The findings further suggests that both the gender prefer the two modes than the rest of the
available transport. However, the train and the bus compose of about 48% and 47% of the total New
South Wales transport and thus implies the most preferable means of transport by people. Most of
males are found to prefer use of train as a mode of transport than female. In the case of using a
ferry, the female are the frequent users than males. Whereas, the female also prefer light rail than
male. I would also recommend to the New South Wales transport to consider investing much in
trains and buses
Discussion and Conclusion
Based the results from different analysis of various variables from the sections above, the following
can be discussed and summarized as main findings. Consequently, few recommendations to the New
South Wales government can be made in reference to some of the findings from the analyses.
From the section one above, it is clear that the bus and train are the main modes of
transport preferred by people of New South Wales. The two modes are more frequent by both
gender. On the other hand, travel by light rail was the least mode with low count from both gender.
Hence I would recommend to the government of New South Wales transport to consider improving
efficiency on both bus and train transport system by increasing frequency of the travel or by
extending the system.
Similarly, the male prefer to use train more than the female. In the case of ferry, female
prefer to use ferry more than the male. Similarly, male like to use bus as modes of transport than
that of female. Whereas female use light rail more than male. Based on the analysis of the
hypotheses in sections 2, it can be concluded that there is no mode of public transport that compose
about half (50%) of the available transport modes in New South Wales.
The findings further suggests that both the gender prefer the two modes than the rest of the
available transport. However, the train and the bus compose of about 48% and 47% of the total New
South Wales transport and thus implies the most preferable means of transport by people. Most of
males are found to prefer use of train as a mode of transport than female. In the case of using a
ferry, the female are the frequent users than males. Whereas, the female also prefer light rail than
male. I would also recommend to the New South Wales transport to consider investing much in
trains and buses

STATISTICS 9
Moreover, I would recommend that a larger sample population be applied in future research work
as small sample of the population tend to yield produce biased information which definitely leads to
wrong assertions and conclusion about particular aspect under consideration.
References
Ben, B. (2013). New South Wales Centre for Road Safety. Transport for NSW. Conference: Intelligent
Vehicles Symposium (IV). Retrieved from https://www.researchgate.com
Bruce, P. C. (2014). Introductory Statistics and Analysis [e-book]. New Jersey: John Wiley &
Sons.
Diggie, P. J. (2015).Statistics: A data Science for the 21st Century. Journal of the Royal
Statistical Society. Retrieved from http://moodle.koi.edu.au
Garry. B (2018). NSW Long Term Transport Masterplan. Smart Infrastructure Facility:
University of Wollonggong. Retrieved from http://www.transport.nsw.gov.au
Jarman, K. H (2015). Beyond Basic Statistics [e-book]. New Jersey: John Wiley and Sons.
Hanne, R. A.M., & Kposowa, A.J., & Riddle, M. D (2013). Basic Statistics for Social Science.
San Francisco: Jossey-Bass (Wiley).
Lock, Robin H., Lock, Patti Frazer, Morgan, Karl Lock, Erick F., & Dennis F. (2013). Statistics.
Unlocking the Power of Data. Wiley & Sons.
Moreover, I would recommend that a larger sample population be applied in future research work
as small sample of the population tend to yield produce biased information which definitely leads to
wrong assertions and conclusion about particular aspect under consideration.
References
Ben, B. (2013). New South Wales Centre for Road Safety. Transport for NSW. Conference: Intelligent
Vehicles Symposium (IV). Retrieved from https://www.researchgate.com
Bruce, P. C. (2014). Introductory Statistics and Analysis [e-book]. New Jersey: John Wiley &
Sons.
Diggie, P. J. (2015).Statistics: A data Science for the 21st Century. Journal of the Royal
Statistical Society. Retrieved from http://moodle.koi.edu.au
Garry. B (2018). NSW Long Term Transport Masterplan. Smart Infrastructure Facility:
University of Wollonggong. Retrieved from http://www.transport.nsw.gov.au
Jarman, K. H (2015). Beyond Basic Statistics [e-book]. New Jersey: John Wiley and Sons.
Hanne, R. A.M., & Kposowa, A.J., & Riddle, M. D (2013). Basic Statistics for Social Science.
San Francisco: Jossey-Bass (Wiley).
Lock, Robin H., Lock, Patti Frazer, Morgan, Karl Lock, Erick F., & Dennis F. (2013). Statistics.
Unlocking the Power of Data. Wiley & Sons.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 9
Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.