ANOVA and Regression Analysis

Verified

Added on  2020/02/05

|19
|2614
|569
Report
AI Summary
This assignment focuses on analyzing data using ANOVA and regression techniques. The provided output includes an ANOVA table summarizing the variance between groups and within groups, along with F-statistic and significance level. A regression analysis is also conducted, showing coefficients for the independent variable X8, standardized beta values, t-statistics, and p-values. The results indicate a statistically significant relationship between X8 and the dependent variable, as evidenced by the low p-value. Interpretation of these results and their implications in the context of the problem are crucial aspects of completing this assignment.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Q,3 Use the Excel file “Time_Series.xlsx”. Develop a time series model based
on the data. Provide the process of your model development
Time series Plot
Gretl software (WEKA) output file
MODEL-1 ARIMA
SOURCE: Computed data Output file Gretl software ( WEKA) output file

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Normality test
Doornik-Hansen test = 53.7725, with p-value 2.10594e-012
Shapiro-Wilk W = 0.954728, with p-value 1.49004e-011
Lilliefors test = 0.0579438, with p-value ~= 0
Jarque-Bera test = 31.2605, with p-value 1.62884e-007
Process: Interval estimation point wise
Frequency distribution for hour, obs 1-521
number of bins = 23, mean = 261, sd = 150.544
interval midpt frequency rel. cum.
< 23.636 11.818 23 4.41% 4.41% *
23.636 - 47.273 35.455 24 4.61% 9.02% *
47.273 - 70.909 59.091 23 4.41% 13.44% *
70.909 - 94.545 82.727 24 4.61% 18.04% *
94.545 - 118.18 106.36 24 4.61% 22.65% *
118.18 - 141.82 130.00 23 4.41% 27.06% *
141.82 - 165.45 153.64 24 4.61% 31.67% *
165.45 - 189.09 177.27 24 4.61% 36.28% *
189.09 - 212.73 200.91 23 4.41% 40.69% *
212.73 - 236.36 224.55 24 4.61% 45.30% *
236.36 - 260.00 248.18 23 4.41% 49.71% *
260.00 - 283.64 271.82 24 4.61% 54.32% *
283.64 - 307.27 295.45 24 4.61% 58.93% *
307.27 - 330.91 319.09 23 4.41% 63.34% *
330.91 - 354.55 342.73 24 4.61% 67.95% *
354.55 - 378.18 366.36 24 4.61% 72.55% *
378.18 - 401.82 390.00 23 4.41% 76.97% *
401.82 - 425.45 413.64 24 4.61% 81.57% *
425.45 - 449.09 437.27 24 4.61% 86.18% *
449.09 - 472.73 460.91 23 4.41% 90.60% *
472.73 - 496.36 484.55 24 4.61% 95.20% *
496.36 - 520.00 508.18 23 4.41% 99.62% *
>= 520.00 531.82 2 0.38% 100.00%
Test for null hypothesis of normal distribution:
Chi-square(2) = 53.773 with p-value 0.00000
Document Page
Source : WEKA gretl output file – Test statistics of normality
Document Page
Q.1
A) Develop a linear regression model with variable cnt as the dependent
variable and the other variables as independent variables (you may
select some of them). Provide the process of your model development in
detail.
Linear Regression model
Model Summary
Model R R Square Adjusted R
Square
Std. Error of the
Estimate
1 .628a .395 .393 141.624
a. Predictors: (Constant), windspeed, temp, holiday, weathersit, yr, hr,
workingday, mnth, hum, season, dteday
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1
Regression 52213813.305 11 4746710.300 236.657 .000b
Residual 79988564.339 3988 20057.313
Total 132202377.644 3999
a. Dependent Variable: cnt
b. Predictors: (Constant), windspeed, temp, holiday, weathersit, yr, hr, workingday, mnth, hum,
season, dteday
Coefficientsa
Model Unstandardized Coefficients Standardized
Coefficients
t Sig.
B Std. Error Beta
1 (Constant) -61260.745 40369.191 -1.518 .129
dteday 4.534E-006 .000 .409 1.517 .129
season 15.671 4.688 .091 3.343 .001
yr -52.244 94.382 -.143 -.554 .580
mnth -8.955 7.838 -.159 -1.143 .253
hr 7.817 .350 .294 22.352 .000
holiday -24.598 14.719 -.021 -1.671 .095
workingday 3.880 4.974 .010 .780 .435
weathersit -8.112 3.995 -.028 -2.031 .042
temp 255.929 13.361 .272 19.155 .000

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
hum -192.200 14.314 -.205 -13.428 .000
Wind speed 27.834 20.261 .018 1.374 .170
a. Dependent Variable: cnt
Excluded Variablesa
Model Beta In t Sig. Partial
Correlation
Collinearity
Statistics
Tolerance
1 instant 2.950b .708 .479 .011 8.734E-006
a. Dependent Variable: cnt
b. Predictors in the Model: (Constant), wind speed, temp, holiday, weather sit, yr, hr, working
day, month, hum, season, dteday
The equation of the regressor line
cnt =β0 + β1 × dteday+ β2 ×+ β3 × season+ β4 × hum+ β5 × mnth+β6 × hr + β7 ×holiday +e ,
As par the computed data from SPSS output file
We get
cnt=β0 + β1 × 0.409+ β2 × .091+ β3 × ( 0.143 ) + β4 × ( 0.159 ) + β5 × ( 0.294 ) + β6 × ( 0.021 ) + β7 × ( 0.10 ) + β 8 ( 0.0
B) Develop a regression model with variable cnt as the dependent variable and use the other
variables as independent variables (you may select some of them). For this question, you must
use at least three modelling techniques (for example, decision tree, radical basis function, etc.)
and then report the model with the best performance. Provide the process of your model
development in detail.
Regression model-1
MVA (Multi variate analysis)
Multivariate Testsa
Effect Value F Hypothesis df Error df Sig.
Intercept
Pillai's Trace .107 240.521b 2.000 3995.000 .000
Wilks' Lambda .893 240.521b 2.000 3995.000 .000
Hotelling's Trace .120 240.521b 2.000 3995.000 .000
Roy's Largest Root .120 240.521b 2.000 3995.000 .000
season Pillai's Trace .009 18.162b 2.000 3995.000 .000
Wilks' Lambda .991 18.162b 2.000 3995.000 .000
Document Page
Hotelling's Trace .009 18.162b 2.000 3995.000 .000
Roy's Largest Root .009 18.162b 2.000 3995.000 .000
dteday
Pillai's Trace .108 240.970b 2.000 3995.000 .000
Wilks' Lambda .892 240.970b 2.000 3995.000 .000
Hotelling's Trace .121 240.970b 2.000 3995.000 .000
Roy's Largest Root .121 240.970b 2.000 3995.000 .000
weathersit
Pillai's Trace .024 49.012b 2.000 3995.000 .000
Wilks' Lambda .976 49.012b 2.000 3995.000 .000
Hotelling's Trace .025 49.012b 2.000 3995.000 .000
Roy's Largest Root .025 49.012b 2.000 3995.000 .000
a. Design: Intercept + season + dteday + weathersit
b. Exact statistic
Tests of Between-Subjects Effects
Source Dependent Variable Type III Sum of
Squares
df Mean Square F Sig.
Corrected Model cnt 18527484.994a 3 6175828.331 217.098 .000
hr 59.024b 3 19.675 .422 .738
Intercept cnt 10772038.257 1 10772038.257 378.668 .000
hr 31.483 1 31.483 .675 .411
season cnt 802972.179 1 802972.179 28.227 .000
hr 4.180 1 4.180 .090 .765
dteday cnt 10855993.164 1 10855993.164 381.619 .000
hr 22.662 1 22.662 .486 .486
weathersit cnt 2426695.387 1 2426695.387 85.305 .000
hr 24.638 1 24.638 .528 .468
Error cnt 113674892.650 3996 28447.170
hr 186482.615 3996 46.667
Total cnt 275414670.000 4000
hr 714668.000 4000
Corrected Total cnt 132202377.644 3999
hr 186541.639 3999
a. R Squared = .140 (Adjusted R Squared = .139)
b. R Squared = .000 (Adjusted R Squared = .000)
Model-2 Radical basis Function
Factor Level Information
N
workingday 0 1268
1 2730
holiday 0 3897
Document Page
1 101
yr 0 2163
1 1835
Dependent Variable: cnt
Variance Estimatesa
Component Estimate
Var(workingday) -57.623b
Var(workingday * holiday) .000c
Var(workingday * yr) 205.487
Var(workingday * holiday * yr) .000c
Var(Error) 14925.477
Dependent Variable: cnt
Method: Minimum Norm Quadratic Unbiased Estimation (Weight = 1 for Random Effects and Residual)
a. Weighted Analysis - Weighted by hum
b. For the ANOVA and MINQUE methods, negative variance component estimates may occur. Some possible reasons for
their occurrence are: (a) the specified model is not the correct model, or (b) the true value of the variance equals zero.
Model 3- 2 stage least square method
Model Description
Type of Variable
Equation 1
cnt dependent
hr predictor
holiday predictor
windspeed instrumental
weathersit instrumental
workingday instrumental
weekday instrumental
MOD_1
Model Summary
Equation 1
Multiple R .112
R Square .013
Adjusted R Square .012
Std. Error of the Estimate 179.562
ANOVA
Sum of Squares df Mean Square F Sig.
Equation 1
Regression 1646862.024 2 823431.012 25.539
Residual 128872747.484 3997 32242.369
Total 130519609.508 3999
Coefficient Correlations
hr holiday
Equation 1 Correlations hr 1.000 -.085
holiday -.085 1.000

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Coefficients
Unstandardized Coefficients Beta t Sig.
B Std. Error
Equation 1
(Constant) -39.978 32.300 -1.238 .216
hr 20.071 2.809 .754 7.145 .000
holiday -56.631 71.929 -.049 -.787 .431
Coefficient Correlations
hr holiday
Equation 1 Correlations hr 1.000 -.085
holiday -.085 1.000
Test of homoscedacity
Homoskedasticity-corrected, using observations 1-4000
Dependent variable: cnt
Coefficient Std. Error t-ratio p-value
const 33.4123 11.3718 2.9382 0.00332 ***
mnth 1.19504 0.627244 1.9052 0.05682 *
hr 7.69488 0.322797 23.8382 <0.00001 ***
holiday -8.99509 11.3982 -0.7892 0.43006
weekday -0.41897 0.931182 -0.4499 0.65278
weathersit -14.0063 3.23726 -4.3266 0.00002 ***
workingday 18.2381 4.14626 4.3987 0.00001 ***
temp 139.364 72.9057 1.9116 0.05600 *
atemp 149.644 81.6496 1.8328 0.06691 *
windspeed -8.78955 20.0793 -0.4377 0.66160
hum -119.911 12.8429 -9.3368 <0.00001 ***
Statistics based on the weighted data:
Sum squared resid 18415.12 S.E. of regression 2.148598
R-squared 0.327583 Adjusted R-squared 0.325897
F(10, 3989) 194.3328 P-value(F) 0.000000
Log-likelihood -8729.509 Akaike criterion 17481.02
Schwarz criterion 17550.25 Hannan-Quinn 17505.56
Statistics based on the original data:
Mean dependent var 189.2170 S.D. dependent var 181.8210
Sum squared resid 89737023 S.E. of regression 149.9871
Document Page
Homoscedacity based on cnt( non linear plot)
Source : WEKA gretl output file
Q,2
A)Develop a logistic regression model with variable Y as the dependent variable and
the other variables as independent variables (you may select some of them). Provide
the process of your model development in detail.
Logistic Regression model
Assuming that following model was adopted
log( p
1 p )=β0 + β1 × X 1+β2 × X 2+ β3 × X 3+ β4 × X 4+ β5 × X 5+ β6 × X 6+ β7 × X 7+e ,On uploading the data in
the spss , the following results were obtained.
Data Information
Document Page
N
Cases
Valid 5550
Missing 0
Weighted Valid 5550
Cells
Defined Cells 220
Structural Zeros 0
Sampling Zeros 148
Categories
Y 2
X8 11
X10 10
Convergence Informationa,b
Maximum Number of
Iterations
20
Converge Tolerance .00100
Final Maximum Absolute
Difference
7.39611E-009c
Final Maximum Relative
Difference
7.53116E-010
Number of Iterations 18
a. Model: Multinomial Logit
b. Design: Constant + Y + Y * X10 + Y * X13 + Y
* X14 + Y * X15
c. The iteration converged because the
maximum absolute changes of parameter
estimates is less than the specified convergence
criterion.
Goodness-of-Fit Testsa,b
Value df Sig.
Likelihood Ratio 111.758 94 .102
Pearson Chi-Square 140.788 94 .001
a. Model: Multinomial Logit
b. Design: Constant + Y + Y * X10 + Y * X13 + Y * X14 + Y *
X15
Analysis of Dispersiona,b
Entropy Concentration df
Model 200.601 163.233 15
Residual 2768.627 1781.380 5534
Total 2969.227 1944.613 5549
a. Model: Multinomial Logit

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
b. Design: Constant + Y + Y * X10 + Y * X13 + Y * X14
+ Y * X15
Measure of Associationa,b
Entropy .068
Concentration .084
a. Model: Multinomial Logit
b. Design: Constant + Y + Y *
X10 + Y * X13 + Y * X14 + Y *
X15
Document Page
Document Page
B) Develop a classification model with variable Y as the dependent variable and the
other variables as independent variables (you may select some of them). For this
question, you must use at least three modelling techniques (for example, the k-
nearest neighbour algorithm, decision tree, etc) and then report the model with the
best performance. Provide the process of your model development in detail.
Modelling technique -1
K means cluster analysis
Initial Cluster Centers
Cluster
1 2
X2 2 1
X3 5 1
X4 2 1
X9 -2 8
X6 -2 2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
X8 -2 8
X11 -2 8
Iteration Historya
Iteration Change in Cluster Centers
1 2
1 4.818 7.087
2 .128 2.759
3 .195 1.257
4 .092 .505
5 .039 .226
6 .007 .048
7 .018 .114
8 .022 .127
9 .022 .119
10 .008 .039
a. Iterations stopped because the
maximum number of iterations was
performed. Iterations failed to converge.
The maximum absolute coordinate
change for any center is .027. The current
iteration is 10. The minimum distance
between initial centers is 18.276.
Final Cluster Centers
Cluster
1 2
X2 2 2
X3 2 2
X4 2 2
X9 -1 2
X6 0 1
X8 -1 2
X11 -1 1
Number of Cases in each
Cluster
Cluster 1 4674.000
2 876.000
Valid 5550.000
Missing .000
Document Page
Modelling technique -2
General log Linear model
Data Information
N
Cases
Valid 5550
Missing 0
Weighted Valid 4665
Cells
Defined Cells 8
Structural Zeros 0
Sampling Zeros 0
Categories X2 2
X4 4
Convergence Informationa,b
Maximum Number of
Iterations
20
Converge Tolerance .00100
Final Maximum Absolute
Difference
.00110
Final Maximum Relative
Difference
.00068c
Number of Iterations 6
a. Model: Multinomial
b. Design: Constant + X1 + X7 + X9 + X11
+ X12
c. The iteration converged because the
maximum relative change of parameter
estimates is less than the specified
convergence criterion.
Goodness-of-Fit Testsa,b
Value df Sig.
Likelihood Ratio 1139.450 2 .000
Pearson Chi-Square 1185.370 2 .000
a. Model: Multinomial
b. Design: Constant + X1 + X7 + X9 + X11 + X12
Document Page
Cell Counts and Residualsa,b
X2 X4 Observed Expected Residual Standardized
Residual
Adjusted
Residual
Deviance
Count % Count %
1
0 2 0.0% 158.567 3.4% -156.567 -12.650 -27.957 -4.182
1 788 16.9% 991.190 21.2% -203.190 -7.273 -27.766 -19.014
2 1056 22.6% 1280.279 27.4% -224.279 -7.359 -16.002 -20.168
3 20 0.4% 17.806 0.4% 2.194 .521 2.214 2.156
2
0 4 0.1% 30.911 0.7% -26.911 -4.856 -19.709 -4.045
1 1288 27.6% 1403.085 30.1% -115.085 -3.674 -6.733 -14.848
2 1472 31.6% 691.337 14.8% 780.663 32.170 34.363 47.169
3 35 0.8% 91.826 2.0% -56.826 -5.989 -8.181 -8.217
a. Model: Multinomial
b. Design: Constant + X1 + X7 + X9 + X11 + X12
Coefficientsb,c
X2 X4 Ya
1
0 0
1 0
2 0
3 0
2
0 0
1 0
2 0
3 0
a. Sum of the coefficients is
not zero. The generalized log-
odds ratio is not computed.
b. Model: Multinomial
c. Design: Constant + X1 + X7
+ X9 + X11 + X12
Modelling technique -3
Curve estimation-Univariate model
Variable Processing Summary
Variables
Dependent Independent
Y X8
Number of Positive Values 1257 767
Number of Zeros 4293 2938

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Number of Negative Values 0 1845
Number of Missing Values User-Missing 0 0
System-Missing 0 0
Model Summary
R R Square Adjusted R
Square
Std. Error of the
Estimate
.212 .045 .045 .409
The independent variable is X8.
ANOVA
Sum of Squares df Mean Square F Sig.
Regression 43.629 1 43.629 260.644 .000
Residual 928.677 5548 .167
Total 972.306 5549
The independent variable is X8.
Coefficients
Unstandardized Coefficients Standardized
Coefficients
t Sig.
B Std. Error Beta
X8 .075 .005 .212 16.144 .000
(Constant) .239 .006 43.083 .000
Document Page
On Question 2.a), assume you have built a model as the following:
log( p
1 p )=β0 + β1 × X 1+ β2 × X 2+ β3 × X 3+ β4 × X 4+ β5 × X 5+ β6 × X 6+β7 × X 7+e ,
for example. Then you need to submit an MS Excel version with the following format:
exp ( β0 + β1A 2+ β2B 2+ β3C 2+ β4D2+ β5E 2+ β6F 2+ β7G 2 ) /(1exp ( β0 + β1A 2+β2B 2+ β3C 2+
The following table is a mapping between the real variable names and the submission variable names.
Table 2: a mapping table for question 2.a)
Origin
al
varia
X
1
X
2
X
3
X
4
X
5
X
6
X
7
X
8
X
9
X1
0
X1
1
X1
2
X1
3
X1
4
X1
5
X1
6
X1
7
X1
8
X1
9
X2
0
X2
1
X2
2
X2
3
Document Page
ble
name
New
name
in the
MS
Excel
file
A
2
B
2
C
2
D
2
E
2
F
2
G
2
H
2
I2 J2 K2 L2 M
2
N2 O
2
P2 Q
2
R2 S2 T2 U2 V2 W
2
1 out of 19
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]