Statistics Exercises: Regression, ANOVA, and Data Analysis
VerifiedAdded on 2022/09/01
|9
|1348
|21
Homework Assignment
AI Summary
This document presents a comprehensive solution to several statistics exercises, covering key concepts such as regression analysis, ANOVA, and data interpretation. The exercises include fitting cubic regression models, analyzing the significance of terms, and calculating the coefficient of determination. Further analysis involves scatter plots, correlation, and multiple regression models to predict variables like time and birth weight. The document also explores ANOVA applications, including testing for differences in means and examining the effects of treatments and variables on performance and growth. The solutions provide detailed explanations, statistical results, and interpretations, referencing relevant statistical methods and software outputs. Examples include analysis of birth weight, and the growth of alfalfa plants under different conditions. The solutions also include references to relevant statistical literature.

Running head: CLASS EXERCISES 1
Statistics Exercises
Student’s Name
Institutional Affiliation
Statistics Exercises
Student’s Name
Institutional Affiliation
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLASS EXERCISES 2
Exercise 3.29
a. Fit a Cubit Regression model
Y=0.6871+1.8962 X−0.4988 X 2−0.4673 X3
b. Add a Cubic Curve to the Scatter plot
c. A
re
Conditions for Model reasonably satisfied?
Yes. The r-squared value for the model was found to be 94.5%.This implies that calcium (x
variable) predicts the proteinprop variable by 94.5% (King'oriah, 2012) .This implies that
only small variation of 5.5% is attributable to other factors outside the model.
d. Is cubit term significantly different from zero?
Exercise 3.29
a. Fit a Cubit Regression model
Y=0.6871+1.8962 X−0.4988 X 2−0.4673 X3
b. Add a Cubic Curve to the Scatter plot
c. A
re
Conditions for Model reasonably satisfied?
Yes. The r-squared value for the model was found to be 94.5%.This implies that calcium (x
variable) predicts the proteinprop variable by 94.5% (King'oriah, 2012) .This implies that
only small variation of 5.5% is attributable to other factors outside the model.
d. Is cubit term significantly different from zero?

CLASS EXERCISES 3
The cubit term had a t-value (47)= -6.583,p<.001.This implies that the terms was statistically
significant at 10%,5% and 1%.This shows that it is highly significantly different from zero.
e. Coefficient of determination
The Multiple R-squared=0.9449. This implies that calcium (x variable) predicts the
proteinprop variable by 94.5%.Only a small variation of 5.5% is attributable to other factors
outside the model. This implies that calcium perfectly predicts the proteinprop variable.
Exercise 4.2
a. Time against Elevation Scatter plot
From the scatter above, the data points are cluster more on the left-hand side of the graph.
This show a negative correlation. From computation, the correlation between time and elevation
(r= -0.01628).This implies that there exist a negative relationship between the two variables.
The cubit term had a t-value (47)= -6.583,p<.001.This implies that the terms was statistically
significant at 10%,5% and 1%.This shows that it is highly significantly different from zero.
e. Coefficient of determination
The Multiple R-squared=0.9449. This implies that calcium (x variable) predicts the
proteinprop variable by 94.5%.Only a small variation of 5.5% is attributable to other factors
outside the model. This implies that calcium perfectly predicts the proteinprop variable.
Exercise 4.2
a. Time against Elevation Scatter plot
From the scatter above, the data points are cluster more on the left-hand side of the graph.
This show a negative correlation. From computation, the correlation between time and elevation
(r= -0.01628).This implies that there exist a negative relationship between the two variables.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

CLASS EXERCISES 4
This argument however can’t be relied upon. It assumed that more time should be taken as the
elevation increases. Elevation is therefore not a good predictor of time in this experiment
b. Multiple Regression
Y=8.075−0.001448 X 1−0.7123 X 2
Time=8.075−0.001448 Elevation−0.7123 Length
Elevation was found to be statistically significant at 1% level of significance. The variable length
is however, a better predictor of the time as compared to elevation since it is significant even
at .001 level
The variable elevation and length combine explains 77.03% of the model. The variable elevation
alone explains 0.00026% of the model while the variable length explains 73.7% of the model.
Therefore, the two-predictor model is substantially better than either elevation or length alone.
c. Added variable plot
This argument however can’t be relied upon. It assumed that more time should be taken as the
elevation increases. Elevation is therefore not a good predictor of time in this experiment
b. Multiple Regression
Y=8.075−0.001448 X 1−0.7123 X 2
Time=8.075−0.001448 Elevation−0.7123 Length
Elevation was found to be statistically significant at 1% level of significance. The variable length
is however, a better predictor of the time as compared to elevation since it is significant even
at .001 level
The variable elevation and length combine explains 77.03% of the model. The variable elevation
alone explains 0.00026% of the model while the variable length explains 73.7% of the model.
Therefore, the two-predictor model is substantially better than either elevation or length alone.
c. Added variable plot
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLASS EXERCISES 5
The variable elevation has little contribution on the prediction of time variable.Alone,it has a p-
value >.05 which implies that it is not significant (Berenson, Timothy & David ,2005).Therefore,
one can predict the time variable better by removing it from the model.
Exercise 4.12
BirthWeightOZ=11.87-7.31Black+0.65Hispanic-0.73Other
This implies that birth weight is dependent on the 3 races i.e. Black, Hispanic and Others
according to the model. This equation implies that 1 unit change in mothers’ race to black
leading to decline in birth weight by 7.31 units. A unit change in mothers’ race to Hispanic leads
to birth weight increase by 0.65 units. Also, one more increase of mothers under others category
leads to decreased birth weight by 0.73 units. Blacks have the highest influence to birth weight
followed by others race category and Hispanic race respectively. The constant value is
11.87.This implies that considering a mother does not belong to any of the mentioned race, birth
weight is approximately 11.87.
Exercise 5.6
Why Anova?-The purpose of ANOVA in the setting of the chapter.
c. the means of several population
Exercise 5.12
a.
Explanatory variable-font used
Response variables/dependent variable is students’ performance
b.
The variable elevation has little contribution on the prediction of time variable.Alone,it has a p-
value >.05 which implies that it is not significant (Berenson, Timothy & David ,2005).Therefore,
one can predict the time variable better by removing it from the model.
Exercise 4.12
BirthWeightOZ=11.87-7.31Black+0.65Hispanic-0.73Other
This implies that birth weight is dependent on the 3 races i.e. Black, Hispanic and Others
according to the model. This equation implies that 1 unit change in mothers’ race to black
leading to decline in birth weight by 7.31 units. A unit change in mothers’ race to Hispanic leads
to birth weight increase by 0.65 units. Also, one more increase of mothers under others category
leads to decreased birth weight by 0.73 units. Blacks have the highest influence to birth weight
followed by others race category and Hispanic race respectively. The constant value is
11.87.This implies that considering a mother does not belong to any of the mentioned race, birth
weight is approximately 11.87.
Exercise 5.6
Why Anova?-The purpose of ANOVA in the setting of the chapter.
c. the means of several population
Exercise 5.12
a.
Explanatory variable-font used
Response variables/dependent variable is students’ performance
b.

CLASS EXERCISES 6
This is a randomized experiment. Treatments (fonts) are randomly assigned to students .Any of
the 40 students has equal chance of being assigned any of the 4 fonts.
c.
ANOVA compares means in a population. This is a population of students where a sample is
selected to be involved in the experiments. There exist four groups that need to be compares to
identify if there exist any significant difference in performance from one font to another hence
ANOVA is sufficient test
Exercise 5.32
a. Whether conditions of ANOVA and F-test are satisfied
The whole process of birth and data on birth is independent from one mother to the other.
The data also is dawn from a normal population and random sampling of birth in North Carolina
was done and have common variance (Ott & Longnecker,2015). This shows that ANOVA
procedure and F-test can be carried out. As shown below, the histogram of birth weight variable
shows bar that tend to form a bell-shape which implies that the data is normally distributed.
Therefore, ANOVA and F-test can be carried out on the data.
This is a randomized experiment. Treatments (fonts) are randomly assigned to students .Any of
the 40 students has equal chance of being assigned any of the 4 fonts.
c.
ANOVA compares means in a population. This is a population of students where a sample is
selected to be involved in the experiments. There exist four groups that need to be compares to
identify if there exist any significant difference in performance from one font to another hence
ANOVA is sufficient test
Exercise 5.32
a. Whether conditions of ANOVA and F-test are satisfied
The whole process of birth and data on birth is independent from one mother to the other.
The data also is dawn from a normal population and random sampling of birth in North Carolina
was done and have common variance (Ott & Longnecker,2015). This shows that ANOVA
procedure and F-test can be carried out. As shown below, the histogram of birth weight variable
shows bar that tend to form a bell-shape which implies that the data is normally distributed.
Therefore, ANOVA and F-test can be carried out on the data.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

CLASS EXERCISES 7
b. Analysis of Variance
The analysis of variance of birth weight across the race of the mothers revealed a
F(3,1446)=9.528,P<.05.This implies that the mean births were found to be statistically
significantly difference across mothers race at 5%,1% level of significance (Mendenhall &
Sincich,2016).
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 3 14002 4667.5 9.5282 3.118e-06 ***
Residuals 1446 708332 489.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Exercise 6.25-Parts a,b,d,e
b. Analysis of Variance
The analysis of variance of birth weight across the race of the mothers revealed a
F(3,1446)=9.528,P<.05.This implies that the mean births were found to be statistically
significantly difference across mothers race at 5%,1% level of significance (Mendenhall &
Sincich,2016).
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 3 14002 4667.5 9.5282 3.118e-06 ***
Residuals 1446 708332 489.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Exercise 6.25-Parts a,b,d,e
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CLASS EXERCISES 8
a.
Rows Means Acid Means Cups Mean Std Dev
a 1.16 1.5HCl 1.47 15 1.74 1.11
b 1.57 3.0HCl 1.08
c 1.25 water 2.67
d 2.26
e 2.46
b. .
> model<-aov( Ht4~Acid+Row,data=data)
> summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Acid 2 6.852 3.426 4.513 0.0487 *
Row 4 4.183 1.046 1.378 0.3235
Residuals 8 6.072 0.759
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> model1<-aov( Ht4~Acid+Row+Acid*Row,data=data)
> summary(model1)
Df Sum Sq Mean Sq
Acid 2 6.852 3.426
Row 4 4.183 1.046
Acid:Row 8 6.072 0.759
d
There is a significant effect of row variable on growth of Alfafa plants (p<.05) at
alpha=.05.However,acid treatment (water,1.5HCL and 3.0HCL) had no statistically significant
influence on growth of the Alfafa plants (p>.05).An interaction between the two variables
revealed a non-significant effect on growth of the plants (p>.05)( Johnson &
Bhattacharyya,2019).
e .There is indeed a significant effect of distance from the window on the growth of Alfafa
plants. The row variable had a p=.0487 which slightly less than .05 hence significant at 5% and
10% confidence levels.
a.
Rows Means Acid Means Cups Mean Std Dev
a 1.16 1.5HCl 1.47 15 1.74 1.11
b 1.57 3.0HCl 1.08
c 1.25 water 2.67
d 2.26
e 2.46
b. .
> model<-aov( Ht4~Acid+Row,data=data)
> summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Acid 2 6.852 3.426 4.513 0.0487 *
Row 4 4.183 1.046 1.378 0.3235
Residuals 8 6.072 0.759
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> model1<-aov( Ht4~Acid+Row+Acid*Row,data=data)
> summary(model1)
Df Sum Sq Mean Sq
Acid 2 6.852 3.426
Row 4 4.183 1.046
Acid:Row 8 6.072 0.759
d
There is a significant effect of row variable on growth of Alfafa plants (p<.05) at
alpha=.05.However,acid treatment (water,1.5HCL and 3.0HCL) had no statistically significant
influence on growth of the Alfafa plants (p>.05).An interaction between the two variables
revealed a non-significant effect on growth of the plants (p>.05)( Johnson &
Bhattacharyya,2019).
e .There is indeed a significant effect of distance from the window on the growth of Alfafa
plants. The row variable had a p=.0487 which slightly less than .05 hence significant at 5% and
10% confidence levels.

CLASS EXERCISES 9
References.
Berenson, M. L., Timothy, C. K., David M. L. (2005). Basic business statistics: concepts and
applications. 10th ed. New York: Prentice Hall.
Fox, J. (2005). Getting started with the R commander: a basic-statistics graphical user interface
to R. J Stat Softw, 14(9), 1-42.
Johnson, R. A., & Bhattacharyya, G. K. (2019). Statistics: principles and methods. John Wiley &
Sons.
King'oriah, G. K. ( 2012) "Fundamentals of applied statistics." Nairobi: The Jomo Kenyatta
Foundation.
Mendenhall, W. M., & Sincich, T. L. (2016). Statistics for Engineering and the Sciences. CRC
Press.
Ott, R. L., & Longnecker, M. (2015). An introduction to statistical methods and data analysis
(7th ed.). Pacific Grove, CA: Brooks Cole.
Verzani, J. (2018). Using R for introductory statistics. CRC press.
References.
Berenson, M. L., Timothy, C. K., David M. L. (2005). Basic business statistics: concepts and
applications. 10th ed. New York: Prentice Hall.
Fox, J. (2005). Getting started with the R commander: a basic-statistics graphical user interface
to R. J Stat Softw, 14(9), 1-42.
Johnson, R. A., & Bhattacharyya, G. K. (2019). Statistics: principles and methods. John Wiley &
Sons.
King'oriah, G. K. ( 2012) "Fundamentals of applied statistics." Nairobi: The Jomo Kenyatta
Foundation.
Mendenhall, W. M., & Sincich, T. L. (2016). Statistics for Engineering and the Sciences. CRC
Press.
Ott, R. L., & Longnecker, M. (2015). An introduction to statistical methods and data analysis
(7th ed.). Pacific Grove, CA: Brooks Cole.
Verzani, J. (2018). Using R for introductory statistics. CRC press.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 9
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.