Requirement Analysis and Modelling: Rock 'n' Roll Marathon Analysis

Verified

Added on  2020/05/28

|13
|1832
|94
Project
AI Summary
This project undertakes a requirement analysis and modeling of marathon data, focusing on the Rock 'n' Roll Marathon series of 2010. The analysis employs linear regression models to explore the relationships between various factors and outcomes. The study uses data from both current and potential cities to determine the factors that influence marathon success and hotel performance. The project involves data cleaning, variable selection, and the development of multiple linear regression models to identify key predictors such as population, hotel occupancy, and the number of marathon finishers. The results show moderate to strong associations between dependent and independent variables, providing insights into the growth of hotel businesses and participant performance. The analysis also highlights the impact of city demographics, costs, and the presence of running clubs on marathon success. The project concludes with an annotated bibliography of relevant research papers.
Document Page
Running head: REQUIREMENT ANALYSIS AND MODELLING
Requirement Analysis and Modelling
Name of the Student:
Name of the University:
Author’s note:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1REQUIREMENT ANALYSIS AND MODELLING
Table of Contents
Introduction:...............................................................................................................................2
Methods:.....................................................................................................................................2
Data Description:...................................................................................................................2
Data Cleaning:........................................................................................................................2
Dependent Variable:...............................................................................................................3
Independent Variable:............................................................................................................3
Result:........................................................................................................................................4
Discussion:.................................................................................................................................9
Annotated Bibliography:..........................................................................................................11
Document Page
2REQUIREMENT ANALYSIS AND MODELLING
Introduction:
The data provides information about an International firm of 2010. The rock and roll
Marathon parent company of competitor group hires the firm. Here, we are finding a
summary of the Rock and Roll Marathon in 2010.
The Rock ‘n’ Roll Marathon series combines entertainment with running with live
bands performing at every mile along the course and a post-race headliner concert. Marathon
is organised as preliminarily a tourism-driven event with about 65% of runners visiting from
out of state. There are currently 14 races across USA provinces.
The data analysis is carried out with the help of MS-Excel. The linear regression
models show the association between dependent variable and independent variables of both
the datasets.
Methods:
Data Description:
More than 320000 competitors, more than 697000 expo attendees, more than 1
million spectators and almost 65% competitors travel to event market in this research report.
Competitors from all 50 states and 106 countries are involved in event demographics.
Among the Marathon competitors, 58% are female and 42% are male. 84% people are
between the ages 24-44 years. Only 67% people has income more than $60K and 38% people
has income more than $100K.
We are eager to apply linear regression model developed from the current cities data
to the potential cities data for determining which The Rock ‘n’ Roll Marathon series of USA
city should expand.
Data Cleaning:
The data of Current cities contains some missing values. Hence, we deleted some
variables of the data that are-
1. Marathon Registrations
2. Relay Registrations
The data of Potential cities contains many missing values. Therefore, we deleted
many variables of the data such as-
1. Annual Leisure Visitors
2. Percentage of Leisure Visitors
3. Total Visitors
4. Total Room Nights Available
Document Page
3REQUIREMENT ANALYSIS AND MODELLING
5. 2009 Booked Rooms Nights
6. Total Hotel Revenue
7. RevPar
8. Fortune 1000 companies
9. Visitors per Fortune 1000 company
10. Severe Weather Risk
11. Total Local Attractions
12. Average Local Age
13. Average Local HHI
14. Years Event Existed
The data of “Raleigh” is also removed for having lots of missing values.
Dependent Variable:
In the data of current cities, we have identified four dependent variables and
corresponding independent variables in four linear regression models.
In the data of potential cities, we have identified the dependent variable that is
“Marathon Finishers per 1000 Residents”. It is also known as response variable.
Independent Variable:
The dependent variables of the four linear regression models of the Current city are-
A.
1. Total City Population
2. Total Room Nights Available
3. Average Daily Hotel Rate
4. Total Metro Area Population
B.
1. Half and Full Marathons
2. Full Marathons Only
3. Half Marathons Only
4. Annual Leisure Visitors
5. Annual Visitors
6. Percent Leisure Visitors
7. Total Hotel Rooms
C.
1. Average Annual Temperature
2. Average Event Month High Temperature
3. Average Event Month Low Temperature
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4REQUIREMENT ANALYSIS AND MODELLING
4. Average Event Daily Temperature
5. Annual Temperature
D.
1. Cost of Living
2. Running Clubs in City
3. Running Clubs in MetroA
The independent variables of the dataset of potential city are-
1. Physical Health Ranking
2. Healthy Behaviours
3. Average Health City Rank
4. Total Running Scores in Metro
5. Proximity to Next Closest “Rock n Roll”
6. Total Sports Teams
7. Total Marathons and Half Marathons
8. Running Clubs in City
9. Running Clubs in Metro Area
10. Average Event Monthly Temp
11. Event Month High Temp
12. Event Month Low Temp
13. Elevation
14. Total Running Events
15. Overall Well-being Rank
16. Marathon Finishers in Metro Area
The indexes and ranking included in the dataset is Severe Weather Risk, Overall Physical
Well-being, Physical Health Ranking, Healthy Behaviours, Average Health City Rank and
Cost of Living.
Result:
The linear regression model is-
Yi = β0 + β1*X1 + β2*X2 + β3*X3 + … + βn*Xn
Here, Yi= the values of the dependent or response variable
Xi = the values of the independent or predictor variables
βi = coefficients of independent variables of the regression model
The linear regression model helps to establish the linear relationship between single
dependent variable and one or more independent variables.
Document Page
5REQUIREMENT ANALYSIS AND MODELLING
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.651918527
R Square 0.424997765
Adjusted R Square 0.169441217
Standard Error 0.077836857
Observations 14
ANOVA
df SS MS F Significance F
Regression 4 0.040302334 0.010076 1.663028 0.241422271
Residual 9 0.054527186 0.006059
Total 13 0.09482952
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 0.402599088 0.237080236 1.698155 0.123704 -0.13371366 0.9389118
Total city population -6.47746E-08 4.93601E-08 -1.31229 0.221903 -1.7643E-07 4.689E-08
Total Room Nights Available 4.58716E-09 2.00765E-09 2.284843 0.048179 4.55446E-11 9.129E-09
Average Daily Hotel Rate 0.001184731 0.001995456 0.593715 0.567323 -0.0033293 0.0056988
Total metro area population 1.39476E-08 1.38297E-08 1.008526 0.339546 -1.7337E-08 4.523E-08
The linear regression model, the dependent variable is considered as “Hotel
Occupancy Rate” and the independent variables are Total City Population, Total Room
Nights Available, Average Daily Hotel Rate and Total Metro Area Population. The Value of
multiple R-square is 0.42499, the F-statistic is 1.6630 and p-value is 0.2414. Therefore, we
accept the null hypothesis of insignificant association between the dependent variable and
independent variables at 95% confidence interval. The value of multiple R-square indicates a
moderate association between dependent and independent variables of the linear regression
model. Total Room Nights Available is significantly associated with dependent variable.
Document Page
6REQUIREMENT ANALYSIS AND MODELLING
3.57142857142857
10.7142857142857
17.8571428571429
25
32.1428571428572
39.2857142857143
46.4285714285714
53.5714285714286
60.7142857142857
67.8571428571429
75
82.1428571428571
89.2857142857143
96.4285714285714
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Normal Probability Plot
Sample Percentile
Hotel Occupancy Rate
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99172282
R Square 0.983514152
Adjusted R Square 0.723210497
Standard Error 1888306.754
Observations 14
ANOVA
df SS MS F Significance F
Regression 7 1.70178E+15 2.43E+14 95.45293824 1.00085E-05
Residual 8 2.85256E+13 3.57E+12
Total 15 1.73031E+15
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 3782830.508 6122342.738 0.617873 0.55383676 -10335317.15 17900978.17
Half and Full Marathons -293846.2347 1268786.959 -0.2316 0.822665584 -3219674.207 2631981.738
Full Marathon Only 0 0 65535 #NUM! 0 0
Half Marathon Only 0 0 65535 #NUM! 0 0
Annual Leisure Visitors 0.842746501 0.376207736 2.240109 0.055415984 -0.024790093 1.710283095
Annual Visitors -0.724308312 0.28062788 -2.58103 0.032564239 -1.371437363 -0.077179262
Percent Leisure Visitors -6557478.052 7375399.083 -0.8891 0.399892726 -23565178.82 10450222.72
Total Hotel Rooms 285.8072859 24.31693936 11.75342 2.51099E-06 229.7323232 341.8822485
The linear regression model, the dependent variable is considered as “Booked Room
Nights” and the independent variables are Half & Full Marathons, Full Marathons Only, Half
Marathons Only, Annual Leisure Visitors, Annual Visitors, Percent Leisure Visitors and
Total Hotel Rooms. The Value of multiple R-square is 0.9835, the F-statistic is 95.4529 and
p-value is 0.0. Therefore, we reject the null hypothesis of insignificant association between
the dependent variable and independent variables at 95% confidence interval. The value of
multiple R-square indicates a very strong association between dependent and independent
variables of the linear regression model.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7REQUIREMENT ANALYSIS AND MODELLING
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.731691302
R Square 0.535372162
Adjusted R Square 0.217759789
Standard Error 13.90707982
Observations 14
ANOVA
df SS MS F Significance F
Regression 5 2005.695321 401.1391 2.592585 0.111150061
Residual 9 1740.661821 193.4069
Total 14 3746.357143
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 140.3296858 93.1270253 1.506863 0.166113 -70.3382811 350.9977
Average Annual Temp -0.35635222 0.677322365 -0.52612 0.611522 -1.88856186 1.175857
Average Event Month Low Temp 1.98925079 0.761320303 2.612896 0.028137 0.267024617 3.711477
Average Event Month Daily Temp 0 0 65535 #NUM! 0 0
Annual Rain (in) -0.462577706 0.218187493 -2.12009 0.063028 -0.9561521 0.030997
Average Event Month High Temp -2.354548163 1.267054263 -1.85829 0.096075 -5.22082403 0.511728
The linear regression model, the dependent variable is considered as “Total Local
Attractions and the independent variables are Cost of Living, Running Clubs in City and
Running Stores in MetroA. The Value of multiple R-square is 0.5354, the F-statistic is 2.5926
and p-value is 0.1112. Therefore, we reject the null hypothesis of insignificant association
between the dependent variable and independent variables at 95% confidence interval. The
value of multiple R-square indicates a moderate association between dependent and
independent variables of the linear regression model.
3.57142857142857
10.7142857142857
17.8571428571429
25
32.1428571428572
39.2857142857143
46.4285714285714
53.5714285714286
60.7142857142857
67.8571428571429
75
82.1428571428571
89.2857142857143
96.4285714285714
0
20
40
60
80
Normal Probability Plot
Sample Percentile
Total Local Attractions
Document Page
8REQUIREMENT ANALYSIS AND MODELLING
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.73132008
R Square 0.53482906
Adjusted R Square 0.39527778
Standard Error 0.57741057
Observations 14
ANOVA
df SS MS F Significance F
Regression 3 3.833291774 1.277763925 3.832491 0.046075306
Residual 10 3.334029655 0.333402965
Total 13 7.167321429
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -1.8962829 1.198223216 -1.582579027 0.144599 -4.56609062 0.7735248
Cost of Living 0.03034223 0.011003859 2.757417507 0.020219 0.005824108 0.0548604
Running Clubs in City -0.0127459 0.027710115 -0.459972444 0.655375 -0.07448787 0.0489961
Running Stores in Metro Area0.0420842 0.020853436 2.018094313 0.071206 -0.00438015 0.0885486
Here, the dependent variable is considered as “Marathon Finishers per 1000
Residents” and the independent variables are Cost of Living, Running Clubs in City and
Running Stores in Metro A. The Value of multiple R-square is 0.5348, the F-statistic is
3.8324 and p-value is 0.04607. Therefore, we reject the null hypothesis of insignificant
association between the dependent variable and independent variables at 95% confidence
interval. The value of multiple R-square is 0.5348. It indicates a moderate association
between dependent and independent variables of the linear regression model.
3.57142857142857
10.7142857142857
17.8571428571429
25
32.1428571428572
39.2857142857143
46.4285714285714
53.5714285714286
60.7142857142857
67.8571428571429
75
82.1428571428571
89.2857142857143
96.4285714285714
0
0.5
1
1.5
2
2.5
Normal Probability Plot
Sample Percentile
Marathon Finishers per 1000 Residents
Document Page
9REQUIREMENT ANALYSIS AND MODELLING
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.857522136
R Square 0.735344214
Adjusted R Square 0.486256415
Standard Error 0.81472606
Observations 34
ANOVA
df SS MS F Significance F
Regression 16 31.35316753 1.959573 2.952148666 0.016519041
Residual 17 11.28423541 0.663779
Total 33 42.63740294
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 3.106975436 #NUM! #NUM! #NUM! #NUM! #NUM!
Physical Health Ranking -7.57746E+12 4.95868E+12 -1.52812 0.144874842 -1.8039E+13 2.884E+12
Healthy Behaviors -7.57746E+12 4.95868E+12 -1.52812 0.144874842 -1.8039E+13 2.884E+12
Average Health City Rank 1.51549E+13 9.91736E+12 1.528121 0.144874842 -5.7689E+12 3.608E+13
Total Running Stores in Metro 0.124930057 0.076827354 1.626114 0.122317944 -0.03716149 0.2870216
Proximity to Next Closest RnR 0.000883765 0.000661927 1.33514 0.199435058 -0.00051278 0.0022803
Total Sports Teams -0.044036044 0.153315731 -0.28722 0.777412016 -0.36750396 0.2794319
Total Marathons and Half Marathons -0.070599284 0.103707595 -0.68075 0.505194901 -0.28940318 0.1482046
Running Clubs in City -0.087390682 0.047087709 -1.85591 0.080886526 -0.18673706 0.0119557
Running Clubs in Metro Area -0.007402606 0.008116044 -0.9121 0.374472017 -0.02452596 0.0097207
Average Event Monthly Temp 1.99433E+14 1.20118E+14 1.660312 0.115182789 -5.3993E+13 4.529E+14
Event Month High Temp -9.97166E+13 6.00589E+13 -1.66031 0.115182789 -2.2643E+14 2.7E+13
Event Month Low Temp -9.97166E+13 6.00589E+13 -1.66031 0.115182789 -2.2643E+14 2.7E+13
Elevation 0.000257642 0.000197969 1.301429 0.21047361 -0.00016004 0.0006753
Total Running Events 0.006572144 0.006031223 1.089687 0.291057119 -0.00615262 0.0192969
Overall Well-Being Rank -0.004064882 0.003419839 -1.18862 0.250929842 -0.01128011 0.0031503
Marathon Finishers in Metro Area 0.000104219 4.12515E-05 2.52644 0.021737315 1.71863E-05 0.0001913
The values of βi’s indicate that Physical Health Ranking, Healthy Behaviours, Total
Sports Teams, Total Marathons & Half Marathons, Running Clubs in City, Running Clubs in
Metro Area, Event Month High Temp, Event Month Low Temp and Overall Well-Being
Rank are negatively associated with the response variable Marathon Finishers per 1000
Residents. The other variables such that Average Health City Rank, Total Running Stores in
Metro, Proximity to next Closest “Rock n Roll”, Average Event Monthly Temp, Elevation,
Total Running Events and Marathon Finisher in Metro area are positively associated with the
dependent or response variable.
The value of multiple R-square is = 0.73534. It is also known as “Coefficient of
determination”. Therefore, we can say that 73.53% variation of the dependent variable is
explained by dependent variables. The value of multiple R-square refers that association
between dependent variable and independent variables are strong and positive.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10REQUIREMENT ANALYSIS AND MODELLING
The F-statistic of the linear regression model is 2.9521. The p-value of the linear
regression model is 0.0165 (<0.05). Therefore, we reject the null hypothesis of insignificant
association between dependent and independent variables at 95% confidence interval. The
individual p-values of the model in case of Marathon Finishers in Metro Area (0.0001)
indicate that the factor is significantly associated with dependent variable. Any other
individual independent variable is not significantly associated with dependent variable with
95% probability as their p-values are greater than 0.05.
The Normal Probability Plot shows the sample percentiles of the distribution.
1.47058823529412
7.35294117647059
13.2352941176471
19.1176470588235
25
30.8823529411765
36.764705882353
42.6470588235294
48.5294117647059
54.4117647058823
60.2941176470588
66.1764705882353
72.0588235294118
77.9411764705882
83.8235294117647
89.7058823529412
95.5882352941177
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Normal Probability Plot
Sample Percentile
Marathon Finishers per 1000 Residents
Discussion:
We can observe the moderate and strong associations in all the regression models.
The association among dependent and independent variables in all the three linear regression
models are measured in this report. After executing linear regression models, we found that
success of “Rock ‘n’ Roll’ lies in the growth in hotel business and participants’ great
performance in Marathon. The effect of various factors regarding Marathon and hotel
performance lies in the contributions of predictor variables. These predictor variables depend
on engagement of stakeholders. The current cities involve the data of 11 cities and potential
cities include the information regarding 35 cities of USA. The Marathon Finishers per 1000,
Booked Room Nights, Hotel Occupancy rates and Total Visitors registration are the
independent variables of the linear regression models. The Marathon Finishers per 1000 is the
Document Page
11REQUIREMENT ANALYSIS AND MODELLING
same variable that combines both the datasets of Current and Potential Cities. However,
dataset of Potential cities involve more independent variables in case of dependent variable
rather than in the dataset of Current cities. The linker variable (Marathon Finishers per 1000)
is assumed as dependent variable for both the datasets.
The independent variables moderately and linearly explain dependent variable in
current cities’ Marathon dataset. However, the independent variables strongly and linearly
explain dependent variable in Potential cities’ Marathon dataset. The population of cities,
costs of hotel rents, Total sports clubs, Total running clubs and Healthy Behaviours
significantly correlates themselves with Marathon Finishers per 1000.
chevron_up_icon
1 out of 13
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]