Predicting NASCAR Driver Winnings with Regression Analysis

Verified

Added on  2023/06/11

|6
|987
|253
Homework Assignment
AI Summary
This assignment focuses on predicting NASCAR driver winnings using regression analysis. It examines various performance metrics such as poles won, number of wins, top five finishes, and top ten finishes to develop regression equations. The analysis includes testing for individual significance of variables and creating new variables like 'Top 2-5' and 'Top 6-10' to improve the model's predictive power. The assignment concludes by recommending the most suitable regression equation based on statistical significance and adjusted R-squared values, providing a comprehensive interpretation of the regression coefficients. The final model suggests that top finishes are strong indicators of increased winnings.
Document Page
Predicting Winnings for NASCAR Drivers
1. Suppose you wanted to predict Winning ($) using only the number of poles won
(Poles), the number of wins (Wins), the number of top five finishes (Top 5), or the
number of top ten finishes (Top 10), which of these four variables provides the best
single predictors of winning?
To determine the best predictor variable of winnings between the number of poles won (Poles),
the number of wins (Wins), the number of top-five finishes (Top 5) and the number of top ten
finishes (Top (10), a correlation matrix was developed (Li & Ji, 2005).
Figure 1: Correlation matrix
From the correlation analysis, it can be seen that the best predictor variable is the number of top
ten finishes (Top 10). The rationale behind this deduction is because the Top 10 variable and the
Winnings ($) variables have the highest correlation.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
2. Develop an estimated regression equation that can be used to predict Winnings ($)
given the number of poles won (Poles), the number of wins (Wins), the number o top
five finishes (Top 5), and the number of top ten (op 10) finishes. Test for individual
significance and discuss your findings and conclusions.
Figure 2: Regression Model
From the regression model, the regression equation that can be derived is:
Winnings ($) = 3,140,367.09 – 12,938.92Poles + 13,544.81Wins + 71,629.39Top_5 +
117,070.58Top_10
From the regression model, it is evident that holding all other factors constant, the base winning
is $3,140,367.09. The constant is statistically significant at p = 0.05.
Document Page
The number of poles won is a negative factor since a unit increase in the number of poles
decreases the number of winnings by 12,938.92 units holding all factors constant. However, the
number of poles won is not statistically significant at p = 0.05.
Holding all factors constant, a unit increase in the number of wins increases the number of
winnings by 13,544.81 units. Similarly to the number of poles, the variable is not statistically
significant at p = 0.05.
A unit increase in the number of top 5 finishes increases the number of winnings by 71,629.39
units ceteris peribus. However, the number of top 5 finishes variable is not statistically
significant at p = 0.05.
On the other hand, a unit increase in Top 10 finishes increases the number of winnings by
117,070.58 units. The Top 10 finishes variable is statistically significant at p = 0.05.
3. Create two new independent variables: Top 2-5 and TOP 6-10. TOP -5 represents
the number of times the driver finished between second and fifth place and Top 6-10
represents the number of times the driver finished between sixth and tenth place.
Develop an estimated regression equation that can be used to predict Winnings ($)
using Poles, Wins, Top 2-5 and Top 6-10. Test for individual significance and
discuss your findings ad conclusions.
Figure 3: Regression Model
Document Page
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.91
R Square 0.82
Adjusted R Square 0.80
Standard Error 581382.20
Observations 35
ANOVA
df SS MS F Significance F
Regression 4 4.63473E+13 1.16E+13 34.28 0.00
Residual 30 1.01402E+13 3.38E+11
Total 34 5.64875E+13
Coefficients Standard Error t Stat P-value
Intercept 3140367.09 184229.02 17.05 0.00
Poles -12938.92 107205.08 -0.12 0.90
Wins 202244.78 90225.87 2.24 0.03
Top 2-5 188699.97 34586.32 5.46 0.00
Top 6-10 117070.58 33432.88 3.50 0.00
From the regression model, the regression equation that can be derived is:
Winnings ($) = 3,140,367.09 – 12,938.92Poles + 202,244.78Wins + 188,699.97Top_2-5 +
117,070.58Top_6-10
Similar to the previous regression, the base winning and the impact of the poles winning remains
the same.
However, holding all factors constant, a unit increase in the number of wins increases the
number of winnings by 202,244.78 units. The variable also becomes statistically significant at p
= 0.05.
A unit increase in the number of top 2-5 finishes increases the number of winnings by
188,699.97 units ceteris peribus. The variable is also statistically significant at p = 0.05.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
On the other hand, a unit increase in Top 6-10 finishes increases the number of winnings by
117,070.58 units. The Top 6-10 finishes variable is statistically significant at p = 0.05.
4. Based upon the results of your analysis, what estimated regression equation would
you recommend using to predict Winnings ($)? Provide an interpretation of the
estimated regression coefficients for this equation.
The two regression models have an adjusted R squared of 80%. Thus, 80% of the variability is
explained by factors in the models in each of the regression model. Based on the adjusted R
squared, it will not be possible to distinguish the most recommendable regression equation
(Bates et al. 2014). However, the second regression model is the most recommendable since it
has more variables which are statistically significant compared to the first regression model.
Document Page
Reference:
Bates, D., Maechler, M., Bolker, B., and Walker, S., 2014. lme4: Linear mixed-effects models
using Eigen and S4. R package version, 1(7), pp.1-23.
Li, J. and Ji, L., 2005. Adjusting multiple testing in multilocus analyses using the eigenvalues of
a correlation matrix. Heredity, 95(3), p.221.
chevron_up_icon
1 out of 6
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]