Data Analysis for Train Station Usage
VerifiedAdded on 2020/10/22
|10
|1433
|339
AI Summary
The provided project report is a comprehensive analysis of train station usage data. The report involves calculating mean values, standard deviations, and other statistical measures to understand the trends in station usage. It also includes a detailed calculation of the value of 'm' in the equation y = mx + c using the given data points. Further, it predicts the station usage for years 12 and 15 based on the calculated values. The report concludes that data analysis is essential in resolving issues related to numeracy and provides insights into various statistical measures.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Numeracy and Data
Analysis
Analysis
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Table of Contents
INTRODUCTION...........................................................................................................................1
MAIN BODY...................................................................................................................................1
1. Representation of data in tabular form.....................................................................................1
2. Dara representation in charts...................................................................................................1
......................................................................................................................................................2
3. Calculations of mean, median, mode, standard deviation and range.......................................2
4. Calculating values of m, c and station usage indicator............................................................5
CONCLUSION................................................................................................................................5
REFERENCES................................................................................................................................7
INTRODUCTION...........................................................................................................................1
MAIN BODY...................................................................................................................................1
1. Representation of data in tabular form.....................................................................................1
2. Dara representation in charts...................................................................................................1
......................................................................................................................................................2
3. Calculations of mean, median, mode, standard deviation and range.......................................2
4. Calculating values of m, c and station usage indicator............................................................5
CONCLUSION................................................................................................................................5
REFERENCES................................................................................................................................7
INTRODUCTION
Data analysis can be defined as the process of identifying, collecting, scheduling and
evaluating information in order to get positive outcomes of different queries. With the help of
researchers get results for their questions (Chen and Yang, 2015). Main objective of it is to
discover useful information and support decision making with appropriate arguments. Present
report covers various topics such as data in tabular form and in charts, calculation of mean,
median, mode, range, standard deviation etc. Apart from this m, c and station usage are also
calculated.
MAIN BODY
1. Representation of data in tabular form
The data which is presented in following table is related to Durham Station (Train station
usage in London, 2019).
Years Station usage
2009 29
2010 26
2011 54
2012 33
2013 7
2014 6
2015 6
2016 162
2017 302
2018 269
2. Dara representation in charts
Column chart:
1
Data analysis can be defined as the process of identifying, collecting, scheduling and
evaluating information in order to get positive outcomes of different queries. With the help of
researchers get results for their questions (Chen and Yang, 2015). Main objective of it is to
discover useful information and support decision making with appropriate arguments. Present
report covers various topics such as data in tabular form and in charts, calculation of mean,
median, mode, range, standard deviation etc. Apart from this m, c and station usage are also
calculated.
MAIN BODY
1. Representation of data in tabular form
The data which is presented in following table is related to Durham Station (Train station
usage in London, 2019).
Years Station usage
2009 29
2010 26
2011 54
2012 33
2013 7
2014 6
2015 6
2016 162
2017 302
2018 269
2. Dara representation in charts
Column chart:
1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
From the above chart station usage of Durham can be analysed for ten years. The chart
shows that station usage are continuously fluctuating every year. In first year the usage were 29
and at the end of tenth year it has been increased up to 269.
Line chart:
The above line graph depicts station usage of Durham for the period of ten year from
2009 to 2018. It depicts that the number are changing continuously with year. In 9th year station
usage were reached to the peak.
2
shows that station usage are continuously fluctuating every year. In first year the usage were 29
and at the end of tenth year it has been increased up to 269.
Line chart:
The above line graph depicts station usage of Durham for the period of ten year from
2009 to 2018. It depicts that the number are changing continuously with year. In 9th year station
usage were reached to the peak.
2
3. Calculations of mean, median, mode, standard deviation and range
Years Station usage
2009 29
2010 26
2011 54
2012 33
2013 7
2014 6
2015 6
2016 162
2017 302
2018 269
∑ X 894
Mean 89.400
Mode 6
Median 31
Range 296
Maximum 302
Minimum 6
Mean: Mean is the average of all the number of range. In order to calculate mean for
different number first of all, values are added and then the result is divided by total number of
values. For example if total of values is 1000 and number of observations is 10 then mean will be
100 for it. Mean is also known as central value of a set of numbers (Figueres-Esteban, Hughes
and Van Gulijk, 2015). Steps to calculate it are as follows:
Formula of mean: ∑ X / N
3
Years Station usage
2009 29
2010 26
2011 54
2012 33
2013 7
2014 6
2015 6
2016 162
2017 302
2018 269
∑ X 894
Mean 89.400
Mode 6
Median 31
Range 296
Maximum 302
Minimum 6
Mean: Mean is the average of all the number of range. In order to calculate mean for
different number first of all, values are added and then the result is divided by total number of
values. For example if total of values is 1000 and number of observations is 10 then mean will be
100 for it. Mean is also known as central value of a set of numbers (Figueres-Esteban, Hughes
and Van Gulijk, 2015). Steps to calculate it are as follows:
Formula of mean: ∑ X / N
3
= 894 / 10
= 89.4
Mode: The value which appears most often time in a set of data is known as mode. When
not a single number repeats in a range then there is no mode for it. In order to calculate it
following steps are required to be followed:
Mode in the above table is 6 because it is repeating in the table.
Median: It is the middle or central value of set of data. With the help of it range is
separated in two parts the upper one and lower one. When series is in even number then two
values will be considered as median and in odd number's series the middle number will be
median (Gatobu, Arocha and Hoffman-Goetz, 2016). The steps which are followed while
calculating median are as follows:
Formula of median: ( N +1 ) / 2 when data series is odd and when series is even then
formula will be N / 2
= 10+1/2
= 5.5 observation
=29+33/2
= 31
Range: The difference between lower and upper value of data series is known as range.
Steps which are followed to calculate range are as follows:
Formula of range: Max – Min
= 302– 6
= 296
Standard deviation: It is a statistical term which helps to determine the dispersion of a
data set relative to the mean and calculated as square root of variance. The steps which are
followed to calculate it are as follows:
Year
Station
usage (x) x- mean (x-m)2
1 29 -60 3651
2 26 -63 4023
4
= 89.4
Mode: The value which appears most often time in a set of data is known as mode. When
not a single number repeats in a range then there is no mode for it. In order to calculate it
following steps are required to be followed:
Mode in the above table is 6 because it is repeating in the table.
Median: It is the middle or central value of set of data. With the help of it range is
separated in two parts the upper one and lower one. When series is in even number then two
values will be considered as median and in odd number's series the middle number will be
median (Gatobu, Arocha and Hoffman-Goetz, 2016). The steps which are followed while
calculating median are as follows:
Formula of median: ( N +1 ) / 2 when data series is odd and when series is even then
formula will be N / 2
= 10+1/2
= 5.5 observation
=29+33/2
= 31
Range: The difference between lower and upper value of data series is known as range.
Steps which are followed to calculate range are as follows:
Formula of range: Max – Min
= 302– 6
= 296
Standard deviation: It is a statistical term which helps to determine the dispersion of a
data set relative to the mean and calculated as square root of variance. The steps which are
followed to calculate it are as follows:
Year
Station
usage (x) x- mean (x-m)2
1 29 -60 3651
2 26 -63 4023
4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
3 54 -35 1255
4 33 -56 3184
5 7 -82 6794
6 6 -83 6960
7 6 -83 6960
8 162 73 5267
9 302 213 45187
10 269 180 32246
Total 115528
Formula of standard deviation: √ (variance)
Variance = {∑(x – mean) 2 / N}
= 115528 / N
=115528/10
=11552.8 or 11553
So variance is= 11553
Standard deviation is √ 11553
Standard deviation = 107 or 107.48
4. Calculating values of m, c and station usage indicator
In order to calculate station usage indicator year will be considered as x and station usage
will be considered as y.
1.Table:
Year (x)
Station
Usage (y) X2 ∑xy
1 29 1 29
2 26 4 52
3 54 9 162
5
4 33 -56 3184
5 7 -82 6794
6 6 -83 6960
7 6 -83 6960
8 162 73 5267
9 302 213 45187
10 269 180 32246
Total 115528
Formula of standard deviation: √ (variance)
Variance = {∑(x – mean) 2 / N}
= 115528 / N
=115528/10
=11552.8 or 11553
So variance is= 11553
Standard deviation is √ 11553
Standard deviation = 107 or 107.48
4. Calculating values of m, c and station usage indicator
In order to calculate station usage indicator year will be considered as x and station usage
will be considered as y.
1.Table:
Year (x)
Station
Usage (y) X2 ∑xy
1 29 1 29
2 26 4 52
3 54 9 162
5
4 33 16 132
5 7 25 35
6 6 36 36
7 6 49 42
8 162 64 1296
9 302 81 2718
10 269 100 2690
∑x= 55 ∑y= 894 ∑X2=385 ∑xy=7192
This model helps in determining the value of m in y = mx + c by taking the following steps:-
2. Value of M: M= N∑xy- ∑x∑y / N∑ X2 - (∑x)2
= 10*7192 – (55*894)/ 10*385- (55)2
= 22750/ 825
= 27.58 or 28
3. Value of c: ∑y- m ∑x/ N
=894-28*55 /10
= -64.6 or 65
4. Station usage in year 12: Y= mx+c
Y= 28*12+(-65)
=271 So the station usage in year 12 will be 271.
5. Station usage in year 15: Y= mx+c
Y= 28*15+ (-65)
= 355, in 15th year station station usage will be 355.
CONCLUSION
From the above project report it has been concluded that data analysis is the process of
gathering and evaluating information in order to find best suitable answers for queries. For this
purpose of different values are analysed such as mean, mode, median, range, standard deviation.
With the help of all of them average, minimum, maximum, central etc. values are measured.
Data analysis helps to resolve all the issues which are faced by individuals while studying
numeracy.
6
5 7 25 35
6 6 36 36
7 6 49 42
8 162 64 1296
9 302 81 2718
10 269 100 2690
∑x= 55 ∑y= 894 ∑X2=385 ∑xy=7192
This model helps in determining the value of m in y = mx + c by taking the following steps:-
2. Value of M: M= N∑xy- ∑x∑y / N∑ X2 - (∑x)2
= 10*7192 – (55*894)/ 10*385- (55)2
= 22750/ 825
= 27.58 or 28
3. Value of c: ∑y- m ∑x/ N
=894-28*55 /10
= -64.6 or 65
4. Station usage in year 12: Y= mx+c
Y= 28*12+(-65)
=271 So the station usage in year 12 will be 271.
5. Station usage in year 15: Y= mx+c
Y= 28*15+ (-65)
= 355, in 15th year station station usage will be 355.
CONCLUSION
From the above project report it has been concluded that data analysis is the process of
gathering and evaluating information in order to find best suitable answers for queries. For this
purpose of different values are analysed such as mean, mode, median, range, standard deviation.
With the help of all of them average, minimum, maximum, central etc. values are measured.
Data analysis helps to resolve all the issues which are faced by individuals while studying
numeracy.
6
REFERENCES
Books and Journals:
Chen, Y. and Yang, Z. J., 2015. Message formats, numeracy, risk perceptions of alcohol-
attributable cancer, and intentions for binge drinking among college students. Journal of
drug education. 45(1). pp.37-55.
Figueres-Esteban, M., Hughes, P. and Van Gulijk, C., 2015, September. The role of data
visualization in railway big data risk analysis. In Proceedings of the 25th European
Safety and Reliability Conference, ESREL 2015 (pp. 2877-2882). CRC Press/Balkema.
Gatobu, S. K., Arocha, J. F. and Hoffman-Goetz, L., 2016. Numeracy, health numeracy, and
older immigrants’ primary language: an observation-oriented exploration. Basic and
Applied Social Psychology. 38(4). pp.185-199.
Marks, G. N., 2015. School sector differences in student achievement in Australian primary and
secondary schools: A longitudinal analysis. Journal of School Choice. 9(2). pp.219-238.
Online
Train station usage in London. 2019. [Online]. Available through:
<https://data.london.gov.uk/dataset/train-station-usage>
7
Books and Journals:
Chen, Y. and Yang, Z. J., 2015. Message formats, numeracy, risk perceptions of alcohol-
attributable cancer, and intentions for binge drinking among college students. Journal of
drug education. 45(1). pp.37-55.
Figueres-Esteban, M., Hughes, P. and Van Gulijk, C., 2015, September. The role of data
visualization in railway big data risk analysis. In Proceedings of the 25th European
Safety and Reliability Conference, ESREL 2015 (pp. 2877-2882). CRC Press/Balkema.
Gatobu, S. K., Arocha, J. F. and Hoffman-Goetz, L., 2016. Numeracy, health numeracy, and
older immigrants’ primary language: an observation-oriented exploration. Basic and
Applied Social Psychology. 38(4). pp.185-199.
Marks, G. N., 2015. School sector differences in student achievement in Australian primary and
secondary schools: A longitudinal analysis. Journal of School Choice. 9(2). pp.219-238.
Online
Train station usage in London. 2019. [Online]. Available through:
<https://data.london.gov.uk/dataset/train-station-usage>
7
1 out of 10
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.