University of Guelph - STAT*2040 DE Data Analysis Assignment 2019

Verified

Added on 2023/04/20

AI Summary

This document presents a solved data analysis assignment for STAT*2040 DE, likely completed in Winter 2019 at the University of Guelph. The assignment is divided into three parts, each involving data analysis and conclusion writing based on published studies. Part I focuses on a one-sample problem, employing a t-test to analyze data, interpret confidence intervals, and conduct hypothesis testing. Part II addresses a two-sample problem, utilizing box plots and Q-Q plots to assess data distribution and applying the Welch test to compare means between European and North American fish populations. Part III involves another two-sample problem, likely comparing data from children and adults, using t-tests and confidence intervals to determine significant differences between the groups. The solution includes R output snippets and references to relevant research articles. Desklib provides access to this and other solved assignments.

Running head: STAT*2040 DE DATA ANALYSIS ASSIGNMENT 1
STAT 2040 DE DATA ANALYSIS ASSIGNMENT
By (Name of Student)
(Institutional Affiliation)
(Date of Submission)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 2
PART I
Figure 1: Box plot for the ratios of completion times between ENCC and
Grab&Drop id ratio

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 3
Figure 2: QQ plot (Normal quantile-quantile plot of the ratios)
From the boxplot above, we can see that the distribution is skewed to the right. The box plot
shows that there is skewedness to the right. Also, there exist some break in the Normal qq plot
which suggest that the distribution is of in this case is bimodal.
One of the assumption of the t test is that the data should be normally distributed. Although the
distribution is not a perfect t- distribution can be used because we do not have heavy tails.
t- Procedure to analyze data
The following are the result from the R output window for the one sample t-test;
##R-Output

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 4
One Sample t-test
data: x
t = 13.679, df = 35, p-value = 1.313e-15
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.8405423 1.1335175
sample estimates:
mean of x
0.9870299
The 95% confidence interval is of the mean [0.8405423, 1.1335175]. Since the confidence
interval contain 0 we thus first infer that the percentage difference is equal to zero and thus that
the test is significant. Second since the confidence interval contains positive number we can say
that the population mean percentage of the ENCC and Grab&Drop id ratio was also positive.
There is assumption based on the little bias.
Hypothesis testing
We can hypothesis that ratios was not equal to zero.
H0 : μ=0
H1 : μ ≠ 0

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 5
From the findings, the p-value is 1.313e-15. Since this p value < 0.05 our level of confidence
(i.e. 95% CI), we reject the null hypothesis that the population mean percentage was zero. Thus
we conclude that the population mean percentage was not equal to zero
PART II
Figure 3: Side-by-side box plots of the European and North American fish.
Figure 4: Normal Q-Q plot North American Fish

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 6
For this case, the data was not transformed and thus used as given.
Figure 5: Normal Q-Q plot; European Fish

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 7
Based on the plots above, the two distributions have smaller right tails. The American group has
some outliers and the European group does not have any outlier. The data points fall close to the
straight line but most of them do not fall on it. In this case we should use welch test. The pooled
variance assumes that the two populations have equal variances or that the variance of the two
group is not significantly different. Levine test is often used to see if the variances are different
but another rule is to make sure that no group has variance which is twice the variance of the
other. The variance of control group is 697 and air group is 7725. In this case we use the Welch
test because clearly one groups variance is very large relative to the other.
The R-Output
###Welch Two Sample t-test

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 8
>data: EUR and AMER
>t = 4.4843, df = 66.96, p-value = 2.94e-05
>alternative hypothesis: true difference in means is not equal to 0
>95 percent confidence interval:
> 7719.64 20104.46
>sample estimates:
>mean of x mean of y
could not locate
>44963.33 31051.28
The hypothesis of this test is
H0 : μEu ropeanFish =μAmericanFis h The means are not significantly different
H1 : μEuropeanFis h ≠ μAmericanFis h The means are different
Interpretations
Since the p value = 0.00295 and it is less than 0.05 our level of significance we reject the null
hypothesis that the true mean concentration level of total PCB is the same for both the groups.
The confidence interval is positive it indicates the European group has significantly higher mean
than the American group

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 9
PART III
( μC h i l dren−μ Adults ) + Zα √ ( m−1 ) s2
2+ ( n−1 ) s2
2
m+n−2
Zα √ ( m−1 ) s2
2 + ( n−1 ) s2
2
m+n−2 =1.96 √ ( 19−1 ) 6.4182+ ( 28−1 ) 7.6122
19+28−2
= 400.2794
( μC h ildren−μAdults )=461−471=−10
−10 ± 400.2794
[-410. 2794, 390.27]
sp
2= √ ( 19−1 ) 6.4182 + ( 28−1 ) 7.6122
19+28−2 =204,2242
t= ( μC h ildren−μAdults )
s p
2

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 10
t= −7.12
20.2242=0.0391
But the t-critical from the table = 1.880
Since t critical is greater than 0.0391 t calculated we fail to reject the null hypothesis that there is
any difference between the two groups. Thus we can conclude that there exist significant
differences between the two groups.
References

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STAT*2040 DE DATA ANALYSIS ASSIGNMENT 11
Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-
dimensional regression. The Journal of Machine Learning Research, 15(1), 2869-2909.
Hites et al. (2014). Global Assessment of Organic Contaminants in Farmed Salmon. Science,
303(5655):226-229.
Tartakovsky, A., Nikiforov, I., & Basseville, M. (2014). Sequential analysis: Hypothesis testing and
changepoint detection. Chapman and Hall/CRC.
Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority. Chapman and
Hall/CRC.