Statistical Modelling for Public Transport Infrastructure

Verified

Added on 2023/06/05

AI Summary

This report analyses datasets through statistical tools to understand behaviour and preferences of travellers and offer advice for future surveys. It covers single variable analysis, two variable analysis and dataset 2 analysis.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

STATISTICAL MODELLING
STUDENT ID:
[Pick the date]

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Section 1: Introduction
a) Due to growth in city size and underlying population, the public transport infrastructure
requires maintenance and capital investments so as to provide mobility options to the
people that are efficient yet affordable. One of the ways in which the efficiency factor
coupled with assessability is improved is through alternation of timings and overhauling of
routes so as to ensure wider coverage to a larger number of people and ensuring that travel
time is minimised for a larger segment of population. These exercises are carried out after
relevant market research is done with regards to the patterns of usage of travellers and
there are specialised agencies that are involved in the same which provide key input in this
regards (Meyers, 2017). These then form the basis of introduced changes which may not
be useful to everyone but aims to maximise the efficiency and utility of the transport
network while providing higher coverage especially to remote locations. The given report
tends to analyse some datasets through statistical tools in order to understand behaviour
and preferences of travellers and offer advice for future surveys.
b) For the given dataset, it needs to be determined if the dataset is primary or secondary. In
order for the dataset to the primary, it is imperative that the underlying data must be
collected by the researcher directly from the subjects or respondents. Clearly, this is not
happening in the given case as the data has not been collected by the university and it has
merely sourced the data and provided us the same. As a result, the given dataset would be
labelled as secondary only and not primary (Eriksson and Kovalainen, 2015). There are
essentially six variables in the given dataset with a sample size of 1000 observations. A
brief description of the given variables is indicated as follows.
 Mode – It indicates the public transport means that a given trip uses and is
essentially a categorical variable. Considering that no automatic arrangement of the
responses is possible, hence the given variable is represented using nominal scale.
 Date – It indicates the date of travel for the given trip. Considering that the given
responses can be arranged in chronological order, hence the appropriate
measurement scale would be ordinal.
 Tap – Tap essentially represents two states namely tap on and tap off for the given
trip and is essentially a categorical variable. Considering that no automatic

arrangement of the responses is possible, hence the given variable is represented
using nominal scale.
 Time represents the time aspect related to the trip and would be considered a
quantitative variable. The measurement scale used for this would be interval scale.
 Count highlights the requisite frequency and is essentially a numerical or
quantitative variable. The concerned measurement scale is ratio as absolute zero is
defined.
 Location highlights the underlying station in the trip where tap on or tap off is
happening at the particular time. The underlying variable is categorical.
Considering that no automatic arrangement of the responses is possible, hence the
given variable is represented using nominal scale
The key cases may be derived considering the differences in the above variables which
have been defined above and would correspond to the behaviour and different preferences
of the travellers.
c) For collection of dataset 2, 30 respondents have been selected and the relevant information
collected from these. The focus of this data is only on two variables namely the gender of the
respondent along with public transport mode. The dataset 2 would be termed as primary data
considering the fact that it was not been taken from some other primary or secondary source
but has been collected myself using survey as mechanism (Hair et. al., 2015). Even though
this dataset is primary, if does not imply that this dataset would be more accurate that the
dataset 1. Two reasons are responsible for the same. One is the use of non-probability based
sampling technique. The other is the low sample size of 30 which is insufficient for an
accurate representation of the population. The underlying sampling is convenience sampling
which also does not aid in faithful representation of the population and hence the results
obtained from the analysis of this data may lack in reliability (Eriksson and Kovalainen,
2015). With regards to data type and the corresponding measurement scale, gender would be
categorical variable with the use of a nominal measurement scale since no automatic
arrangement. The same is applicable with regards to mode of transport (Hillier, 2016).
Section 2: Single variable Analysis – Dataset 1
a) The public transport usage summary of the Dataset 1 has been provided as follows.

The corresponding graphical summary of the information presented in the above table is
exhibited as follows.
It is apparent from the summary of mode based on dataset 1 that the most commonly used
public transport mode is train considering the highest frequency amongst the four modes. Bus
mode is also quite frequently used with slight difference between train and bus. However, the
other modes (light rail and ferry) have only limited ridership and are not popular as modes of
public transport. Hence, it is imperative from the government perspective that requisite
spending in expanding train and bus related infrastructure should be made so that the
increasing traffic can be handled without the impact of congestion. Also, it makes sense for
the government to explore the other two mediums which are less frequently used so as to
enhance their usage and hence reduce the current load on train and bus related infrastructure.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

b) The relevant statistical technique to be applied here is hypothesis test for which the
hypotheses to be considered as summarised below.
The computation of the sample proportion has been carried out taking into cognisance the
sample size of 1000 and 484 being the trips involving trains. The hypothesis related output
generated through the use of excel is indicated below.
For hypothesis testing, the relevant approach deployed is p value. The computed value of this
measure as seen from the above output attached comes out as 0.8517. It is apparent that this p
value tends to exceed the significance level which implies that the available evidence is not
sufficient to warrant null hypothesis rejection. Hence, alternative hypothesis cannot be
assumed to be true (Flick, 2015).
Hence, requisite statistical support with regards to train capturing more than 50% market
share is not present based on sample data. This is on expected lined considering the train and
bus are quite close with regards to popularity and usage levels. This implies that no one mode
would have more than 50% share as there is some share occupied by other modes of transport
such as light rail and ferry.
Section 3: Analysis of Two Variables – Dataset 1
a) The train related public transport mode numerical summary is presented below in a tabular
format.

The corresponding graphical summary of the information presented in the above table is
exhibited as follows.
The handling of maximum traffic at the Parramatta train station is established from the aid of
both the numerical as well as graphical summary. Further, it is noteworthy that the values
pertaining to Parramatta train station are significantly higher when compared to the traffic
generated at the other selected train stations.
(b) The relevant statistical technique to be applied here is hypothesis test for which the
hypotheses to be considered as summarised below.
The significance level for the hypothesis test has been taken as 0.05 or 5%. Besides, the
relevant test statistic is F as apparent from the alternate hypothesis. The ANOVA output for
the sample data is obtained below.

The p value derived for the ANOVA test has come out as 0.66. This tends to exceed the
significance level of 0.05. The net result is that the available evidence from the sample data
does not warrant rejection of null hypothesis (Eriksson and Kovalainen, 2015). Hence, the
relevant conclusion is that the proportions of tap on and tap off travellers do not exhibit any
meaningful difference.
(c) Based on the above analysis, it may be appropriate to conclude that the Parramatta station
is the optimal choice in relation of linkage of the proposed underground train line since this
would allow for maximum usage of the new train line connection and hence would justify the
underlying investments by the government. Further, it would also enable better service and
lower issues of over-crowding in peak hours.
Section 4: Analysis of Dataset 2
In regards to the primary dataset 2, the numerical summary of gender preferences is exhibited
below.
The corresponding graphical summary of the information presented in the above table is
exhibited as follows.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

As per the above summary of the data collected, it is apparent that with regards to light rail
and bus, no particular gender preference is observed. But stark difference between the
preferences of the two genders is witnessed in case of train and ferry. More than 50% of the
female travellers in the sample tend to travel by train which is comparatively 25% for the
male counterpart. However, with regards to drawing conclusions on the basis of the above
summary and the underlying primary data, one must be careful owing to the high potential of
the dataset 2 being biased and non-representation of the underlying population. This would
arise on account of a low sample size and non-usage of random probability based sampling
methods to obtain the respondents.
Section 5: Discussion & Conclusion
The discussion that has been conducted above is reflective of train being the most common
transport mode for the sample data that has been provided from an external source. However,
bus as a public transport mode is also quite prominent with only slight difference when
compared to the usage of train. The net result is that the usage of ferry and light rail is limited
to only a small share of the passengers. With regards to the new underground train line being
proposed by the government, Parramatta railway station seems a suitable choice for
connection owing to its ability to function as a hub and thereby cater to higher number of
travellers. Dataset 2 highlights the gender preferences of the usage of public transport in
NSW where females tend to exhibit a preference towards train while males have no particular
preference towards any particular mode of transport. However, considering that the given

data could be potentially biased, more research is necessary in this context of gender
preferences.
With regards to future research on the topic, the time factor needs to be considered whereby
similar data collection ought to be performed in different months so that a common trend in
the preferences and usage trend can emerge. The capital expenditure that is involved in laying
down any incremental infrastructure is quite sizable and hence consideration needs to be
given to factors particularly seasonal trends and possible discount related to a given mode of
public transport. However, extensive research needs to be carried out with understanding the
precise reasons of preferences amongst the available public transport modes and thereby
suitable changes ought to be introduced by the relevant authority to maximise efficiency.

References
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research 3rd ed.
London: Sage Publications.
Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research
project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials
of business research methods. 2nd ed. New York: Routledge.
Hillier, F. (2016) Introduction to Operations Research 6th ed. New York: McGraw Hill
Publications.
Mayers, L. (2017) Greater Sydney and NSW public transport undergo state's 'largest'
timetable overhaul ever, [online] Available at http://www.abc.net.au/news/2017-11-26/new-
sydney-and-nsw-public-transport-timetable-launched/9194538 (Assessed September 21,
2018)