logo

Analysis of Transport Data in NSW: Statistics Study Material

   

Added on  2023-06-04

11 Pages1882 Words92 Views
Data Science and Big Data
 | 
 | 
 | 
University
Statistics
by
Your Name
Date
<Your Name> 2018 1 of
Analysis of Transport Data in NSW: Statistics Study Material_1

Section 1
Introduction
The manner in which individuals travel to their places of work and learning institutions
influences their physic activity level. Due to this, surveys are carried out to assist in planning
of physical activity and model travel promotions in institutions and other places in Australia
that require travelling (Rissel, Mulley and Ding, 2013). This paper is there aimed at analysing
statistical data to determine the commonly used mode of transport and provide
recommendations on areas where improvements or new developments should be made.
Datasets
Dataset is a secondary data since it is collected from a secondary source; Australian website
for transport and is a subset of the data “Opal Tap on and Tap off location- 8th to 14th August
2016” provided by the transport for NSW Open data (Opendata.transport.nsw.gov.au, 2016).
It has got five variables; mode, tap, loc and count. Mode is a categorical variable with cases;
bus, train, ferry and light trail indicating the type of public transport used. Tap is a
categorical variable with cases; on and off indicating whether it’s a tap on or a tap off. Loc is
a categorical variable with cases; train stations and postal codes. Count is a numerical
variable indicating the count of the mode of transport. Date is a quantitative continuous
variable indicating when the tape was held (Bruce, 2015).
Dataset 2 is primary data is collected from a one-on-one survey for 160 individuals (Fowler,
2009). This dataset has three variables, date is quantitative continuous variable indicating
the date when it was collected, gender is a categorical variable with two case; male of
female indicating the sex of the person interviewed. Mode is categorical variable with cases
indicating mode of transport used (Bruce, 2015).
<Your Name> 2018 2 of
Analysis of Transport Data in NSW: Statistics Study Material_2

Section 2
Single Variable Analysis in Dataset 1
The means of transport that was commonly used by the NSW people between the dates 8th
to 14th August, 2016 is determined using sum of total and proportion of total as the
summary statistics. The sum of total represents the total sum of count of a given mode of
transport while proportion represents the sum of count for a given mode of transport as a
fraction of the total. The table of the summary statistics is as shown below:
Table 1: Summary Stat
It is clear that buses were commonly used mode of transport, followed by train, then ferry
and lastly light trail. The above summary statistics are visualized using a pie chart. A pie chart
is a method of data representation that uses a circle that is divided to portions equivalent to
proportions being represented (Rumsey, 2007). In this case the proportion is the mode of
transport as a percentage of the total. It is as shown below:
Fig 1: Pie Chart
<Your Name> 2018 3 of
Analysis of Transport Data in NSW: Statistics Study Material_3

To prove whether more than 50% of the population used the mode with the highest
proportion as their preferred mode of transport, a hypothesis is formulated and tested. Our
sample size is 1000 and the highest proportion for the mode of transport (buses) was 0.48.
To process of formulation and testing of the hypothesis follows the steps below:
Step 1: The initial step is to state the null and alternate hypothesis.
The null hypothesis Ho : P=0.5
The alternate hypothesis Hi : P 0.5
Step 2: Check whether all the conditions for the hypothesis are met
N . p 10=1000 x 0.5 10=500 10
N . ( 1 p ) 10=1000 x ( 10.5 ) 10=500 10
All the conditions are met
Step 3: Determine the Z-test statistic.
Z= P^¿P
p (1 p)
n
¿
Z= 0.480.5
0.5(10.5)
1000
= 0.02
0.0158 =1.26 ¿ 2 dp
Step 4: Developing a decision rule.
Using the default significance level of 0.05 the decision rule will be to accept the null
hypothesis when the P-value for the z-statistic P(Z>-1.26) =0.104 is within the range of -1.96
to 1.96 (Lock et al., 2013). Since the p value is within the required range, we accept the null
<Your Name> 2018 4 of
Analysis of Transport Data in NSW: Statistics Study Material_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Applying Statistics to Solve Traffic Congestion in Australian Cities
|5
|991
|387

Analysis of New South Wales Government Transport by Bus, Train, Ferry and Light Rail
|15
|2124
|177

Analysis of New South Wales Public Transportation System
|9
|1985
|389

Statistics and Data Modelling Assignment - Desklib
|15
|3159
|83

Analysis of Public Transport System in NSW using Opal Tap On/Off Data
|10
|2442
|484

Analysis of Transport System in Sydney City, NSW
|8
|2229
|95