CO5124 - Data Analysis and Modelling: Werner Enterprises Prediction
VerifiedAdded on 2023/05/28
|13
|2065
|350
Report
AI Summary
This report investigates the predictability of Werner Enterprises, Inc.'s share price using historical data from 2014-2016. Multiple linear regression, analysis of variance (ANOVA), coefficient of determination (R-squared), residual analysis, and Variance Inflation Factor (VIF) are employed to analyze the relationship between historical price variations and future share price changes. The VIF indicates no significant multicollinearity among independent variables. Residual analysis suggests a normal distribution, validating hypothesis tests. ANOVA reveals a relationship between dependent and independent variables, although the low coefficient of determination (14.76%) indicates limited explanatory power. While predictions show some deviation from actual values, the model remains error-free. The report concludes that while precise prediction is challenging due to the low R-squared value, improved accuracy could be achieved with a higher coefficient of determination. Desklib provides access to similar solved assignments and study tools for students.

Running head: DATA ANALYSIS AND DECISION MODELLING 1
Data Analysis and Decision Modelling
Predicting the Share Price of Werner Enterprises, Inc.
Student Name
University Name
Data Analysis and Decision Modelling
Predicting the Share Price of Werner Enterprises, Inc.
Student Name
University Name
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA ANALYSIS AND DECISION MODELLING 2
Contents
1.0 Executive Summary.................................................................................................................3
2. 0 Description of Data.................................................................................................................3
3.0 Variance Inflation Factor (VIF).............................................................................................5
4.0 Residual Analysis.....................................................................................................................5
5.0 Analysis of Variance (ANOVA) Table...................................................................................7
6.0 Coefficient of Determination (R-Squared)............................................................................8
7.0 Hypothesis Tests on Each Input.............................................................................................8
8.0 Coefficients...............................................................................................................................9
9.0 Prediction of Tomorrow’s Share Price................................................................................10
10.0 Conclusion............................................................................................................................11
11.0 Appendix –VIF Values........................................................................................................12
12. 0 References............................................................................................................................13
Contents
1.0 Executive Summary.................................................................................................................3
2. 0 Description of Data.................................................................................................................3
3.0 Variance Inflation Factor (VIF).............................................................................................5
4.0 Residual Analysis.....................................................................................................................5
5.0 Analysis of Variance (ANOVA) Table...................................................................................7
6.0 Coefficient of Determination (R-Squared)............................................................................8
7.0 Hypothesis Tests on Each Input.............................................................................................8
8.0 Coefficients...............................................................................................................................9
9.0 Prediction of Tomorrow’s Share Price................................................................................10
10.0 Conclusion............................................................................................................................11
11.0 Appendix –VIF Values........................................................................................................12
12. 0 References............................................................................................................................13

DATA ANALYSIS AND DECISION MODELLING 3
1.0 Executive Summary
The objective of this report is to examine whether historical data can be utilized in
the prediction of the future change in the prices of the Werner Enterprises, Inc. The
enterprise is a transportation and logistics company that was established on 14th,
September, 1982. It engages in interstate and intrastate commerce on transporting of
truckload shipments of general commodities and carries out its operations through two
major segments namely: Truckloads and Werner Logistics. To meet the objectives
analysis is done on company’s data collected between 2014 and 2016. Focus is laid on
the relationship between the day’s price variation and the next day’s variation on the
share price for the company. Multiple linear regression including analysis of variance,
coefficient of determination, residues and Variance Inflation Factor has been performed
to reach a conclusive decision.
2. 0 Description of Data
The data is collected is for 382 days between 2014 and 2016. The data has twelve
variables; the first variable is date, the other ten are used as independent variables while
the twelve variable is used as the dependent variable. The variables indicate the variations
in prices of financial assets for the days on which the data was collected. Some of the
variables and their purposes are listed below:
Date: Indicating the year upon which the data was collected
Aluminum_Vel1: Indicating change in prices of aluminum backdated by a
day
Copper_Vel1: Indicating change in prices of copper backdated by a day
1.0 Executive Summary
The objective of this report is to examine whether historical data can be utilized in
the prediction of the future change in the prices of the Werner Enterprises, Inc. The
enterprise is a transportation and logistics company that was established on 14th,
September, 1982. It engages in interstate and intrastate commerce on transporting of
truckload shipments of general commodities and carries out its operations through two
major segments namely: Truckloads and Werner Logistics. To meet the objectives
analysis is done on company’s data collected between 2014 and 2016. Focus is laid on
the relationship between the day’s price variation and the next day’s variation on the
share price for the company. Multiple linear regression including analysis of variance,
coefficient of determination, residues and Variance Inflation Factor has been performed
to reach a conclusive decision.
2. 0 Description of Data
The data is collected is for 382 days between 2014 and 2016. The data has twelve
variables; the first variable is date, the other ten are used as independent variables while
the twelve variable is used as the dependent variable. The variables indicate the variations
in prices of financial assets for the days on which the data was collected. Some of the
variables and their purposes are listed below:
Date: Indicating the year upon which the data was collected
Aluminum_Vel1: Indicating change in prices of aluminum backdated by a
day
Copper_Vel1: Indicating change in prices of copper backdated by a day
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

DATA ANALYSIS AND DECISION MODELLING 4
US_Gasoline_Vel1: Indicating change in prices of Gasoline backdated by
a day.
West_Texas_Vel1: Indicating change in price of West Texas Intermediate
Oil by backdated by a day.
SPDR_XL1: Indicating the U.S. industrial confidence for industrial-
oriented firms.
CA-Dollar_Vel1: The exchange rate between U.S. and Canadian dollar
backdated by a day.
SP 500: The standard and Poor’s 500 index of stock prices.
The data also consists of interaction variables which the product of two variables to
create new variables. These variables are:
Year x WERN
30year x Copper_Vel1
Aluminum_Vel1 x Aluminum_Vel1
Aluminum_Vel1 x West_Texas_Vel1
Baltic_Vel1 x Copper_Vel1
SPDR_XL1 x West_Texas_Vel1
The variable contained in the last column of the data is the dependent variable that
needs to be predicted. It has been sorted and ranked with the initial percentage variations
in price divided by the total number of rows in the data so that they vary from 0 to 1.
Zero is the maximum decrease in price, the median 0.5 indicates no change in price and 1
indicates the maximum increase in price.
US_Gasoline_Vel1: Indicating change in prices of Gasoline backdated by
a day.
West_Texas_Vel1: Indicating change in price of West Texas Intermediate
Oil by backdated by a day.
SPDR_XL1: Indicating the U.S. industrial confidence for industrial-
oriented firms.
CA-Dollar_Vel1: The exchange rate between U.S. and Canadian dollar
backdated by a day.
SP 500: The standard and Poor’s 500 index of stock prices.
The data also consists of interaction variables which the product of two variables to
create new variables. These variables are:
Year x WERN
30year x Copper_Vel1
Aluminum_Vel1 x Aluminum_Vel1
Aluminum_Vel1 x West_Texas_Vel1
Baltic_Vel1 x Copper_Vel1
SPDR_XL1 x West_Texas_Vel1
The variable contained in the last column of the data is the dependent variable that
needs to be predicted. It has been sorted and ranked with the initial percentage variations
in price divided by the total number of rows in the data so that they vary from 0 to 1.
Zero is the maximum decrease in price, the median 0.5 indicates no change in price and 1
indicates the maximum increase in price.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA ANALYSIS AND DECISION MODELLING 5
3.0 Variance Inflation Factor (VIF)
The variance inflation factor (VIF) quantifies the magnitude of inflation of the
variance. It is used to test for multi-collinearity in the variables of the data set. Multi-
collinearity is defined as the existence of a high correlation between more than one
independent variables. The existence of multi-collinearity in a dataset creates a difficulty
of fitting a regression model into the dataset (Hinton, 2014).
The VIF’s were determined using the PHStat excel plug in on each of the
independent variables and are as indicated in the appendix. Since these values are less
than five, there is little or no correlation between the independent variables and therefore,
they are independent and none of them needs to be ignored while creating the prediction
model.
4.0 Residual Analysis
Normal probability plots are utilized in statistics to identify any observable
departure from normality of a value. This includes kurtosis, skewness and outliers. Here
the plot is used to indicate the presence of outliers in the residuals. The plot is as shown
below:
Figure 1: Normal Probability Plot
3.0 Variance Inflation Factor (VIF)
The variance inflation factor (VIF) quantifies the magnitude of inflation of the
variance. It is used to test for multi-collinearity in the variables of the data set. Multi-
collinearity is defined as the existence of a high correlation between more than one
independent variables. The existence of multi-collinearity in a dataset creates a difficulty
of fitting a regression model into the dataset (Hinton, 2014).
The VIF’s were determined using the PHStat excel plug in on each of the
independent variables and are as indicated in the appendix. Since these values are less
than five, there is little or no correlation between the independent variables and therefore,
they are independent and none of them needs to be ignored while creating the prediction
model.
4.0 Residual Analysis
Normal probability plots are utilized in statistics to identify any observable
departure from normality of a value. This includes kurtosis, skewness and outliers. Here
the plot is used to indicate the presence of outliers in the residuals. The plot is as shown
below:
Figure 1: Normal Probability Plot

DATA ANALYSIS AND DECISION MODELLING 6
The plot is slightly non-linear at the 50th percentile as a result of less trading days
for which the share price was stagnant making these days rank equivalent to 0.5.
However, the plot can be assumed to be linear indicating that there are no observable
outliers. A conclusion can be reached that there is a normal distribution of the residuals
and that any hypothesis test that will be performed will be accurate and reliable.
When carrying out regression, the regression analysis model assumes that there is
a normal distribution of the standard residuals. The standard residue histogram is used to
visualize the distribution of the standard residuals.
Figure 2: Frequency Histogram for Standard Residuals
The histogram indicates a normal distribution with just a little skewness to the
right. However, the skewness is assumed and a conclusion of normal distribution is
reached. This means that hypothesis test and predictions carried out will be reliable and
safe.
5.0 Analysis of Variance (ANOVA) Table
Analysis of Variance (ANOVA) table is a technique used in statistics to perform
hypothesis test to determine whether or not there exists a relationship between the
The plot is slightly non-linear at the 50th percentile as a result of less trading days
for which the share price was stagnant making these days rank equivalent to 0.5.
However, the plot can be assumed to be linear indicating that there are no observable
outliers. A conclusion can be reached that there is a normal distribution of the residuals
and that any hypothesis test that will be performed will be accurate and reliable.
When carrying out regression, the regression analysis model assumes that there is
a normal distribution of the standard residuals. The standard residue histogram is used to
visualize the distribution of the standard residuals.
Figure 2: Frequency Histogram for Standard Residuals
The histogram indicates a normal distribution with just a little skewness to the
right. However, the skewness is assumed and a conclusion of normal distribution is
reached. This means that hypothesis test and predictions carried out will be reliable and
safe.
5.0 Analysis of Variance (ANOVA) Table
Analysis of Variance (ANOVA) table is a technique used in statistics to perform
hypothesis test to determine whether or not there exists a relationship between the
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

DATA ANALYSIS AND DECISION MODELLING 7
dependent and the independent variables. The null hypothesis is that there is no linear
relationship between the variable, while the alternative hypothesis is that there is a linear
relationship between the variables (Foster, Barkus & Yavorsky, 2016). The null
hypothesis is rejectected when the value of significance F is less than the overall
significance level and fails to be reject if the inverse is true. In this case the value of
significance F is 3.75916 x 10−9.
Figure 3:ANOVA Table
This value is less than the overall significance level of 0.05 and therefore there is
sufficient evidence in favor of the alternative hypothesis and thus the null hypothesis can
be rejected. It is therefore sufficient to conclude that there exists a relationship between
the dependent variable and either independent variables.
6.0 Coefficient of Determination (R-Squared)
The coefficient of determination or the r-squared value is used to indicate the
number of points. The table below shows the regression table with the coefficient of
determination.
dependent and the independent variables. The null hypothesis is that there is no linear
relationship between the variable, while the alternative hypothesis is that there is a linear
relationship between the variables (Foster, Barkus & Yavorsky, 2016). The null
hypothesis is rejectected when the value of significance F is less than the overall
significance level and fails to be reject if the inverse is true. In this case the value of
significance F is 3.75916 x 10−9.
Figure 3:ANOVA Table
This value is less than the overall significance level of 0.05 and therefore there is
sufficient evidence in favor of the alternative hypothesis and thus the null hypothesis can
be rejected. It is therefore sufficient to conclude that there exists a relationship between
the dependent variable and either independent variables.
6.0 Coefficient of Determination (R-Squared)
The coefficient of determination or the r-squared value is used to indicate the
number of points. The table below shows the regression table with the coefficient of
determination.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA ANALYSIS AND DECISION MODELLING 8
Figure 4: Coefficient of determination
The coefficient of determination in this case is 14.76% indicating that only
14.76% of the variation of the dependent variable (future change in price) around the
mean are explained by the independent variables. The value is too low and therefore the
regression model does not give that much description of the future change in share price
for the Werner enterprises.
If the future share price for the enterprise was unpredictable, then the value of the
coefficient of determinant would be zero, in this case the value is not zero and therefore
the stock market is not perfectly random and predictions can still be made.
7.0 Hypothesis Tests on Each Input
We look at the respective p-values of the independent variables and compare them
against the overall significance level of the model. If the p-value of any of the
independent variables is less than the overall significance level, then we conclude that the
variable is statistically significant and can be used in the model, otherwise the variable is
said to statistically insignificant and is dropped from the model.
Figure 4: Coefficient of determination
The coefficient of determination in this case is 14.76% indicating that only
14.76% of the variation of the dependent variable (future change in price) around the
mean are explained by the independent variables. The value is too low and therefore the
regression model does not give that much description of the future change in share price
for the Werner enterprises.
If the future share price for the enterprise was unpredictable, then the value of the
coefficient of determinant would be zero, in this case the value is not zero and therefore
the stock market is not perfectly random and predictions can still be made.
7.0 Hypothesis Tests on Each Input
We look at the respective p-values of the independent variables and compare them
against the overall significance level of the model. If the p-value of any of the
independent variables is less than the overall significance level, then we conclude that the
variable is statistically significant and can be used in the model, otherwise the variable is
said to statistically insignificant and is dropped from the model.

DATA ANALYSIS AND DECISION MODELLING 9
Figure 5: The Regression Table with P-Values
From the table, every explanatory variable has a p-value less than the overall
significance level of the whole model, there they are all statistically significant and none
of them can be dropped from the regression model.
8.0 Coefficients
The coefficient of independent variables above indicates how each of them impact
the dependent variable when all other independent variables are kept constant.
The largest positive coefficient is for the interaction variable aluminum_vel1 x
West_Texas_vel1. The coefficient is 0.5711 meaning that there would be an increase by
0.5711 of the future share price for a unit increase in this interaction variable. On the
other hand, the largest negative coefficient is of the interaction variable “SPDR_XL1 x
West_Texas_Vel1. The coefficient is -0.487 and means that the future input share price
would reduce by factor for a unit change of this interaction variable.
It is evident that none of the coefficients is equal of very much close to zero and
therefore, every independent variable is related to the dependent variable to some extent
and thus none is eligible of being deleted from the regression model.
Figure 5: The Regression Table with P-Values
From the table, every explanatory variable has a p-value less than the overall
significance level of the whole model, there they are all statistically significant and none
of them can be dropped from the regression model.
8.0 Coefficients
The coefficient of independent variables above indicates how each of them impact
the dependent variable when all other independent variables are kept constant.
The largest positive coefficient is for the interaction variable aluminum_vel1 x
West_Texas_vel1. The coefficient is 0.5711 meaning that there would be an increase by
0.5711 of the future share price for a unit increase in this interaction variable. On the
other hand, the largest negative coefficient is of the interaction variable “SPDR_XL1 x
West_Texas_Vel1. The coefficient is -0.487 and means that the future input share price
would reduce by factor for a unit change of this interaction variable.
It is evident that none of the coefficients is equal of very much close to zero and
therefore, every independent variable is related to the dependent variable to some extent
and thus none is eligible of being deleted from the regression model.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

DATA ANALYSIS AND DECISION MODELLING 10
9.0 Prediction of Tomorrow’s Share Price
To predict the future share in price the ANOVA table and the confidence interval
estimate & prediction is used. Here, the historical data is used to derive a prediction of
the variation in the share price and then compare it with actual achieved change in share
price as shown below:
Figure 6: Prediction Table
It is evident that the predicted values of the share price are nowhere close to the
actual values of the share price. The two last rows indicate the limits between which the
actual share price is expected to fall within. It can be noted that the actual values of share
price fall within the limits within which they are expected from the prediction indicating
that the model is sufficient and error free.
In all the sampled days the difference between the actual and the predicted share
price is within a maximum deviation of ±0.3. With the value of the coefficient of
determinant being so low, it can be said that the wide prediction interval does not
guarantee a consistent prediction of the share price either going up or going down.
9.0 Prediction of Tomorrow’s Share Price
To predict the future share in price the ANOVA table and the confidence interval
estimate & prediction is used. Here, the historical data is used to derive a prediction of
the variation in the share price and then compare it with actual achieved change in share
price as shown below:
Figure 6: Prediction Table
It is evident that the predicted values of the share price are nowhere close to the
actual values of the share price. The two last rows indicate the limits between which the
actual share price is expected to fall within. It can be noted that the actual values of share
price fall within the limits within which they are expected from the prediction indicating
that the model is sufficient and error free.
In all the sampled days the difference between the actual and the predicted share
price is within a maximum deviation of ±0.3. With the value of the coefficient of
determinant being so low, it can be said that the wide prediction interval does not
guarantee a consistent prediction of the share price either going up or going down.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

DATA ANALYSIS AND DECISION MODELLING 11
10.0 Conclusion
Multiple linear regression has been applied to historic data to predict the future
change in the share price of the Werner Enterprises, Inc. The VIF indicated that there was
no correlation between the independent variables. Residuals analysis indicated that the
dataset under consideration was normally distributed and therefore valid for hypothesis
test. The coefficient of determinant indicated that only 14.76% of the dependent variable
around the mean was explained by the independent variables. The analysis of variance
indicated there was a relationship between the dependent and the independent variables.
The prediction performed indicated that there was a margin of difference between the
actual change in share price and the predicted value of share price. This change could be
attributed to the value of the coefficient of determination being so low. However, if the
value of the coefficient of determination was large enough then accurate and reliable
prediction would be achievable.
11.0 Appendix –VIF Values
10.0 Conclusion
Multiple linear regression has been applied to historic data to predict the future
change in the share price of the Werner Enterprises, Inc. The VIF indicated that there was
no correlation between the independent variables. Residuals analysis indicated that the
dataset under consideration was normally distributed and therefore valid for hypothesis
test. The coefficient of determinant indicated that only 14.76% of the dependent variable
around the mean was explained by the independent variables. The analysis of variance
indicated there was a relationship between the dependent and the independent variables.
The prediction performed indicated that there was a margin of difference between the
actual change in share price and the predicted value of share price. This change could be
attributed to the value of the coefficient of determination being so low. However, if the
value of the coefficient of determination was large enough then accurate and reliable
prediction would be achievable.
11.0 Appendix –VIF Values

DATA ANALYSIS AND DECISION MODELLING 12
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 13
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.





