MA5820: Statistical Analysis - Homework Assignment 3
VerifiedAdded on 2022/09/12
|7
|1332
|28
Homework Assignment
AI Summary
This assignment solution demonstrates statistical analysis techniques using R. The solution addresses three questions, including hypothesis testing using z-tests and paired t-tests, descriptive statistics, and ANOVA. Question 1 involves calculating a z-value and interpreting the p-value to test a hypothesis about the mean lifetime of TVs. Question 2 explores descriptive statistics, outlier detection, and a paired t-test to assess the significance of pre and post test scores, including normalization. Question 3 analyzes the PlantGrowth dataset using ANOVA to determine if there are statistically significant differences among three groups. The solution includes R code, output, and interpretations, along with discussions of assumptions and references.

Question 1
a)
> z <- c(4500)
> s <- c(4800)
> t <- c(400)
> z_value <- (z-s)/t
> summary(z_value)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.75 -0.75 -0.75 -0.75 -0.75 -0.75
From the results, the z-value = -0.75 and p-value is 0.2266.
b) Since The p-value>0.05 hence the mean lifetime of a random sample of 16 TVs is not less than 4,500
hours.
c) Findings have the same conclusion
Question 2
Descriptive statistics
> sapply(beta, mean, na.rm=TRUE)
subject pre post dif
5.50 7.15 29.68 22.53
The results show that the mean of pre is lower than the mean of post with a mean diff of 22.53. To be
specific, the results indicate that post have a mean of 29.68 while the pre has a mean of 7.15.
> summary(beta)
subject pre post dif
Min. : 1.00 Min. : 4.200 Min. :14.20 Min. : 4.10
1st Qu.: 3.25 1st Qu.: 5.025 1st Qu.:20.12 1st Qu.:12.25
Median : 5.50 Median : 6.650 Median :23.00 Median :16.55
Mean : 5.50 Mean : 7.150 Mean :29.68 Mean :22.53
3rd Qu.: 7.75 3rd Qu.: 8.725 3rd Qu.:28.48 3rd Qu.:23.30
Max. :10.00 Max. :12.000 Max. :95.00 Max. :83.00
The results above show the summary measures of the dataset.
Detailed summaries are given below:
> library(pastecs)
> stat.desc(beta)
subject pre post dif
nbr.val 10.0000000 10.0000000 10.000000 10.0000000
nbr.null 0.0000000 0.0000000 0.000000 0.0000000
nbr.na 0.0000000 0.0000000 0.000000 0.0000000
min 1.0000000 4.2000000 14.200000 4.1000000
max 10.0000000 12.0000000 95.000000 83.0000000
range 9.0000000 7.8000000 80.800000 78.9000000
sum 55.0000000 71.5000000 296.800000 225.3000000
median 5.5000000 6.6500000 23.000000 16.5500000
mean 5.5000000 7.1500000 29.680000 22.5300000
SE.mean 0.9574271 0.8242505 7.424745 7.0296365
CI.mean.0.95 2.1658506 1.8645842 16.795941 15.9021425
var 9.1666667 6.7938889 551.268444 494.1578889
std.dev 3.0276504 2.6065089 23.479107 22.2296624
a)
> z <- c(4500)
> s <- c(4800)
> t <- c(400)
> z_value <- (z-s)/t
> summary(z_value)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.75 -0.75 -0.75 -0.75 -0.75 -0.75
From the results, the z-value = -0.75 and p-value is 0.2266.
b) Since The p-value>0.05 hence the mean lifetime of a random sample of 16 TVs is not less than 4,500
hours.
c) Findings have the same conclusion
Question 2
Descriptive statistics
> sapply(beta, mean, na.rm=TRUE)
subject pre post dif
5.50 7.15 29.68 22.53
The results show that the mean of pre is lower than the mean of post with a mean diff of 22.53. To be
specific, the results indicate that post have a mean of 29.68 while the pre has a mean of 7.15.
> summary(beta)
subject pre post dif
Min. : 1.00 Min. : 4.200 Min. :14.20 Min. : 4.10
1st Qu.: 3.25 1st Qu.: 5.025 1st Qu.:20.12 1st Qu.:12.25
Median : 5.50 Median : 6.650 Median :23.00 Median :16.55
Mean : 5.50 Mean : 7.150 Mean :29.68 Mean :22.53
3rd Qu.: 7.75 3rd Qu.: 8.725 3rd Qu.:28.48 3rd Qu.:23.30
Max. :10.00 Max. :12.000 Max. :95.00 Max. :83.00
The results above show the summary measures of the dataset.
Detailed summaries are given below:
> library(pastecs)
> stat.desc(beta)
subject pre post dif
nbr.val 10.0000000 10.0000000 10.000000 10.0000000
nbr.null 0.0000000 0.0000000 0.000000 0.0000000
nbr.na 0.0000000 0.0000000 0.000000 0.0000000
min 1.0000000 4.2000000 14.200000 4.1000000
max 10.0000000 12.0000000 95.000000 83.0000000
range 9.0000000 7.8000000 80.800000 78.9000000
sum 55.0000000 71.5000000 296.800000 225.3000000
median 5.5000000 6.6500000 23.000000 16.5500000
mean 5.5000000 7.1500000 29.680000 22.5300000
SE.mean 0.9574271 0.8242505 7.424745 7.0296365
CI.mean.0.95 2.1658506 1.8645842 16.795941 15.9021425
var 9.1666667 6.7938889 551.268444 494.1578889
std.dev 3.0276504 2.6065089 23.479107 22.2296624
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

coef.var 0.5504819 0.3645467 0.791075 0.9866694
>
Detecting outliers:
>
Detecting outliers:

Post dataset have one outlier hence it is normalized before testing the results.
In addition, paired t test is conducted to test the level of significance between pre and post as shown in
the results output below;
> t.test(pre,post,paired=TRUE)
Paired t-test
data: pre and post
t = 1.9176, df = 9, p-value = 0.08739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.03352959 0.40677153
sample estimates:
mean of the differences
0.186621
From the results, there is no statistical significance difference between the results of the pre and post
since the p-value= 0.08>0.05.
Some of the assumptions include but not limited to the fact that the observations of the dependent
variables are continuous and that the observations are also independent to one another, (Little, and
Rubin, 2019). There should be no outliers in the dependent variables and that the dependent variables
should be normally distributed.
For the purposes of reducing confounding factors in this kind of studies is to ensure that randomization
approaches are followed. This will give all the participants equal opportunity to be selected thus
reducing confounding factors in the study, (Østergaard, et, al, 2015).
Question 3
> View(PlantGrowth)
> attach(PlantGrowth)
> with(PlantGrowth, table(group))
group
ctrl trt1 trt2
10 10 10
The proportions of the plant growth dataset is the same as shown above.
Summary statistics for the variable weight are shown below.
> library(pastecs)
> stat.desc(PlantGrowth)
weight group
nbr.val 30.0000000 NA
nbr.null 0.0000000 NA
nbr.na 0.0000000 NA
min 3.5900000 NA
max 6.3100000 NA
range 2.7200000 NA
sum 152.1900000 NA
median 5.1550000 NA
In addition, paired t test is conducted to test the level of significance between pre and post as shown in
the results output below;
> t.test(pre,post,paired=TRUE)
Paired t-test
data: pre and post
t = 1.9176, df = 9, p-value = 0.08739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.03352959 0.40677153
sample estimates:
mean of the differences
0.186621
From the results, there is no statistical significance difference between the results of the pre and post
since the p-value= 0.08>0.05.
Some of the assumptions include but not limited to the fact that the observations of the dependent
variables are continuous and that the observations are also independent to one another, (Little, and
Rubin, 2019). There should be no outliers in the dependent variables and that the dependent variables
should be normally distributed.
For the purposes of reducing confounding factors in this kind of studies is to ensure that randomization
approaches are followed. This will give all the participants equal opportunity to be selected thus
reducing confounding factors in the study, (Østergaard, et, al, 2015).
Question 3
> View(PlantGrowth)
> attach(PlantGrowth)
> with(PlantGrowth, table(group))
group
ctrl trt1 trt2
10 10 10
The proportions of the plant growth dataset is the same as shown above.
Summary statistics for the variable weight are shown below.
> library(pastecs)
> stat.desc(PlantGrowth)
weight group
nbr.val 30.0000000 NA
nbr.null 0.0000000 NA
nbr.na 0.0000000 NA
min 3.5900000 NA
max 6.3100000 NA
range 2.7200000 NA
sum 152.1900000 NA
median 5.1550000 NA
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

mean 5.0730000 NA
SE.mean 0.1280195 NA
CI.mean.0.95 0.2618293 NA
var 0.4916700 NA
std.dev 0.7011918 NA
coef.var 0.1382204 NA
>
No outliers in weight as shown below:
> aov.cho= aov(weight~ group)
> ls(aov.cho)
[1] "assign" "call" "coefficients" "contrasts" "df.residual"
[6] "effects" "fitted.values" "model" "qr" "rank"
[11] "residuals" "terms" "xlevels"
> summary(aov.cho)
Df Sum Sq Mean Sq F value Pr(>F)
group 2 3.766 1.8832 4.846 0.0159 *
Residuals 27 10.492 0.3886
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
ANOVA test has been performed to check mean difference of the three groups. From the findings, there
is a statistically significant mean difference for the three groups as shown in the output results; p-value=
0.0159<0.05, F= 4.846, DF= 2, 27.
SE.mean 0.1280195 NA
CI.mean.0.95 0.2618293 NA
var 0.4916700 NA
std.dev 0.7011918 NA
coef.var 0.1382204 NA
>
No outliers in weight as shown below:
> aov.cho= aov(weight~ group)
> ls(aov.cho)
[1] "assign" "call" "coefficients" "contrasts" "df.residual"
[6] "effects" "fitted.values" "model" "qr" "rank"
[11] "residuals" "terms" "xlevels"
> summary(aov.cho)
Df Sum Sq Mean Sq F value Pr(>F)
group 2 3.766 1.8832 4.846 0.0159 *
Residuals 27 10.492 0.3886
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
ANOVA test has been performed to check mean difference of the three groups. From the findings, there
is a statistically significant mean difference for the three groups as shown in the output results; p-value=
0.0159<0.05, F= 4.846, DF= 2, 27.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Some of the assumptions of ANOVA are that there is normal distribution of the variables with equality of
variance.
References
Little, R.J. and Rubin, D.B., 2019. Statistical analysis with missing data (Vol. 793). John Wiley & Sons.
Østergaard, S.D., Mukherjee, S., Sharp, S.J., Proitsi, P., Lotta, L.A., Day, F., Perry, J.R., Boehme, K.L.,
Walter, S., Kauwe, J.S. and Gibbons, L.E., 2015. Associations between potentially modifiable risk factors
and Alzheimer disease: a Mendelian randomization study. PLoS medicine, 12(6), p.e1001841.
Appendix
beta <- `3828094_1869469204_beta`
sapply(beta, mean, na.rm=TRUE)
summary(beta)
library(pastecs)
stat.desc(beta)
head(beta)
outlier_values <- boxplot.stats(beta$pre)$out # outlier values.
boxplot(beta$pre, main="pre", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")), cex=0.6)
outlier_values <- boxplot.stats(beta$post)$out # outlier values.
boxplot(beta$post, main="post", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")), cex=0.6)
///Normalize post records**************
head(beta)
//Define minimum and maximum functions//
min_max_norm <- function(x) {
(x - min(x)) / (max(x) - min(x))
variance.
References
Little, R.J. and Rubin, D.B., 2019. Statistical analysis with missing data (Vol. 793). John Wiley & Sons.
Østergaard, S.D., Mukherjee, S., Sharp, S.J., Proitsi, P., Lotta, L.A., Day, F., Perry, J.R., Boehme, K.L.,
Walter, S., Kauwe, J.S. and Gibbons, L.E., 2015. Associations between potentially modifiable risk factors
and Alzheimer disease: a Mendelian randomization study. PLoS medicine, 12(6), p.e1001841.
Appendix
beta <- `3828094_1869469204_beta`
sapply(beta, mean, na.rm=TRUE)
summary(beta)
library(pastecs)
stat.desc(beta)
head(beta)
outlier_values <- boxplot.stats(beta$pre)$out # outlier values.
boxplot(beta$pre, main="pre", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")), cex=0.6)
outlier_values <- boxplot.stats(beta$post)$out # outlier values.
boxplot(beta$post, main="post", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")), cex=0.6)
///Normalize post records**************
head(beta)
//Define minimum and maximum functions//
min_max_norm <- function(x) {
(x - min(x)) / (max(x) - min(x))

}
//Standardized the variables////
beta_norm <- as.data.frame(lapply(beta[2:4], min_max_norm))
head(beta_norm)
attach(beta_norm)
t.test(pre,post,paired=TRUE)
View(PlantGrowth)
attach(PlantGrowth)
with(PlantGrowth, table(group))
PlantGrowth$growth <- as.factor(PlantGrowth$growth)
library(pastecs)
stat.desc(PlantGrowth)
outlier_values <- boxplot.stats(PlantGrowth$weight)$out # outlier values.
boxplot(PlantGrowth$weight, main="weight", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")), cex=0.6)
dim(PlantGrowth)
str(PlantGrowth)
head(PlantGrowth)
attach(PlantGrowth)
aov.cho= aov(weight~ group)
//Standardized the variables////
beta_norm <- as.data.frame(lapply(beta[2:4], min_max_norm))
head(beta_norm)
attach(beta_norm)
t.test(pre,post,paired=TRUE)
View(PlantGrowth)
attach(PlantGrowth)
with(PlantGrowth, table(group))
PlantGrowth$growth <- as.factor(PlantGrowth$growth)
library(pastecs)
stat.desc(PlantGrowth)
outlier_values <- boxplot.stats(PlantGrowth$weight)$out # outlier values.
boxplot(PlantGrowth$weight, main="weight", boxwex=0.1)
mtext(paste("Outliers: ", paste(outlier_values, collapse=", ")), cex=0.6)
dim(PlantGrowth)
str(PlantGrowth)
head(PlantGrowth)
attach(PlantGrowth)
aov.cho= aov(weight~ group)
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

ls(aov.cho)
summary(aov.cho)
summary(aov.cho)
1 out of 7
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.
