STA2300 Data Analysis S1, 19 Assignment 3: Statistical Analysis

Verified

Added on  2023/03/31

|19
|3282
|90
Homework Assignment
AI Summary
This assignment analyzes data related to cardiac arrest patients. It begins by calculating a 98% confidence interval for the population mean heart rate of patients who survived cardiac arrest using a t-distribution, considering the sample size, standard deviation, and degrees of freedom, and includes a discussion of the assumptions for the validity of the confidence interval. The assignment then performs hypothesis testing to determine if the average heart rate is significantly greater than 90 beats per minute, calculating the t-statistic and p-value and interpreting the results, and also examines the proportion of patients surviving cardiac arrest, formulating hypotheses and conducting a z-test for proportions. The analysis continues with a comparison of tar yields from side stream and mainstream smokers using both t-tests and Mann-Whitney U tests, and concludes with a side-by-side boxplot representing the distribution of circulation time depending on patients’ survival status.
Document Page
18
Assignment 3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
18
Answer 1
(a) Average heart rate for N = 88 patients who survived cardiac arrest after being admitted
to hospital for critical illness was calculated as x
¿
= 104.97 (SD = 29.44), which varied
between minimum heart rate of 25 and maximum heart rate of 217.
The confidence interval for population mean with sample SD is
x
¿
± tα
2
s
n
Here significance level is α=2 % and degrees of freedom N – 1 = 87.
Using calculator and t-table t ( 87 ) =2 .37 for α=0 . 02
Hence,
x
¿
± tα
2
s
n = [ 104 . 972. 3729 . 44
88 , 104 . 97+2 .3729 . 44
88 ] =[ 97 .53 , 112. 41 ]
Therefore, with 98% confidence it can be said that population average heart rate will be
somewhere between 97.53 beats/min and 112.41 beats/min for all critically ill patients
who survived cardiac arrest.
Document Page
18
Figure 1: Confidence interval for t-distribution with 87 degrees of freedom
(b) The assumptions for validity of confidence interval or hypothesis test for the population
mean are randomization, independence condition (10% rule), and sample size condition.
Randomization: Data for all critically ill patients was sampled randomly with a good
sampling methodology.
Independence: Population size for all critically ill patients surviving cardiac arrest after
being admitted to hospital is large enough for the sample size of N = 88 to be less than
10% .
Sample Size: Sample size of N = 88 was large enough (n > 30) to apply Central Limit
Theorem. The distribution of critically ill patients surviving cardiac arrest was plotted in
a histogram and a Q-Q plot to identify the possible outliers and shape of the distribution.
Histogram in Figure 2 indicates almost normally shaped histogram with few outliers.
This pattern was confirmed by Q-Q plot in Figure 3 with presence of at least three
outliers. Shapiro Wilk test indicated that the distribution was not at all significant (W =
0.96, p < 0.05).
Figure 2: Histogram of Heart rate for critically ill patients surviving cardiac arrest
Document Page
18
Figure 3: Q-Q plot of Heart rate for critically ill patients surviving cardiac arrest
(c) Null hypothesis: H0: μ ≤ 90: Average heart rate of adult critically ill patients who
survived cardiac arrest is not more than 90 beats per min.
Alternate hypothesis: HA: μ > 90 (Right Tail): Average heart rate of adult critically ill
patients who survived cardiac arrest is significantly than 90 beats per min.
(d) The t-statistic at n – 1 = 87 degrees of freedom is calculated using the formula
t ( 87 ) = x
¿
μ
s
n where x
¿
=104 . 97 , s = 29.44, n = 88, μ = 90.
Therefore,
t ( 87 ) = x
¿
μ
s
n
=104 . 9790
29 . 44
88
=4 . 77
(e) The p-value is calculated as the probability P ( T >tcal =4 .77 )=0 . 000<0 . 01 at n – 1 = 87
degrees of freedom (using t-table).
At 1% level of significance, the null hypothesis was rejected to signify that average heart
rate of adults was significantly greater than 90 beats per min for critically ill patients who
were brought to hospital and survived cardiac arrest.
Figure 4: Critical region for t-test with p-value
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
18
(f) Significance level: 1%
Test Statistic: The t-stat = t ( 87 ) =4 .769 with p < 0.01
SPSS output for one sample t-test at 1% level of significance was done for two tailed
alternate hypothesis. For one tail the p-value will be = 2* 0.000 = 0.000. Hence, the
conclusion from hand evaluated test statistic remains unaltered that the null hypothesis
has to be rejected at 1% level.
Figure 5: T-test SPSS output for heartbeat of critically ill patients who survived cardiac arrest.
Document Page
18
Answer 2
(a) Variable of interest was the nominal variable called “Survival” which has two categories,
namely survived and died.
(b) Null hypothesis: p
^¿=0 .55
¿ : Proportion of critically ill patients surviving cardiac arrest
was equal to 0.55 or 55%
Alternate hypothesis: p
^¿0 .55
¿ : (two tailed): Proportion of critically ill patients surviving
cardiac arrest was significantly no more 0.55 or 55%
(c) The observations of survival were taken randomly from population of critically ill
patients.
Population of critically ill patients surviving cardiac arrest would be more than 10 times
of favourable number of cases here (N = 88).
Here, np=880 .55=48 . 4 >10 and n( 1 p ) =880 . 45=39 .6> 10
Hence, assumptions for the z-test for proportions are satisfied.
(d) The test statistic is calculated as
Z =p p
^¿
p
^¿ ¿¿
¿ ¿ ¿ ¿
¿¿ ¿
Here p = Proportion of critically ill patients surviving cardiac arrest in the sample =
88/146 = 0.60.
p
^¿=
¿ Proportion of critically ill patients surviving cardiac arrest according to the
researcher = 0.55
Document Page
18
Therefore,
Z =p p
^¿
p
^¿ ¿¿
¿ ¿ ¿ ¿
¿¿ ¿
(e) The p-value is calculated as the probability
P ( Z > zcal =0 . 96 ) =1P ( Z0. 96 ) =1-0 . 8315=0 .1685 .
P value, greater than 0.01 signifies that the null hypothesis failed to get rejected. This
indicates that proportion of survival for critically ill patients surviving cardiac arrest was
statistically equal to 55%, and the researcher’s postulation was not correct.
Figure 6: Region for p-value in the normal curve
(f) The true proportion for critically ill patients surviving cardiac arrest was p
^¿=
¿ 0.6
The margin of error is
Z α
2
p
^¿¿ ¿
¿ ¿ ¿ ¿
where α=0 . 01 and n = 88
Now, at α=0 . 01 , Z = 2.575 is the critical value.
Now, according to the researcher, margin of error for critically ill patients surviving
cardiac arrest after being admitted to hospital will be within 0.02. In conservative method
p
^¿=0 .5
¿
Therefore,
Z α
2
p
^¿¿ ¿
¿ ¿ ¿ ¿
=> 0 .50 .5
n ( 0 .02
2. 575 ) 2
=> n
0 .25 > ( 2 . 575
0 . 02 )
2
=> n>0 . 2516576 . 56 =>n> 4144 . 14
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
18
Hence, minimum sample size required is 4145 critically ill patients to obtain margin of
error within 2%.
(g) The margin of error is
Z α
2
p
^¿¿ ¿
¿ ¿ ¿ ¿
where α=0 . 01 and n = 88
Now, at α=0 . 01 , Z = 2.575 is the critical value.
Now, according to the researcher, margin of error for critically ill patients surviving
cardiac arrest after being admitted to hospital will be within 0.02.
Therefore,
Z α
2
p
^¿¿ ¿
¿ ¿ ¿ ¿
=> 0 .60 . 4
n ( 0 . 02
2 .575 )
2
=> n
0 .24 > ( 2. 575
0 .02 )
2
=> n> 0. 2416576 .56 => n>3978 . 37
Hence, minimum sample size required is 3979 critically ill patients to obtain margin of
error within 2%.
The impact of calculation sample size with true proportion from the sample will reduce
the required number of samples to 3979 from sample size of conservative method (N =
4145).
Document Page
18
Answer 3
(a) Null hypothesis: H0: ( μs=μm ) : Average tar yield of side stream smokers was equal to
that of the mainstream smokers.
Alternate hypothesis: H0: ( μsμm ) : Average tar yield of side stream smokers was
significantly greater that of the mainstream smokers.
(b) Three assumptions of independent sample t-test are: Independent observations, normality
of the dependent variable, and homogeneity between the two samples.
Document Page
18
In terms of the present study, tar yield of smokers for eight brands of cigarettes are
independent of each other. Tar yield of Side stream and main stream smokers is
supposed to be normally distributed. This assumption is specially required when sample
size is less than 30. Standard deviations of tar yield of both types of smokers should be
identical.
(c) Average tar yield for side stream smokers
x
¿
s=15. 8+16 . 9+21 . 6+18 .8+29. 3+20 . 7+18 . 9+25
8 =20. 875
Average tar yield for main stream smokers
x
¿
m=18 .5+17 +17 .2+19 . 4+ 15. 6+16 . 4 +13 .3+10 . 2
8 =15 . 95
Standard deviation for tar yield for side stream smokers
ss= ( 15 . 820. 875 ) 2 + ( 16 .920 . 875 ) 2+. ..+ ( 2520 . 875 ) 2
81 =4 . 44
Standard deviation for tar yield for main stream smokers
sm= ( 18. 515. 95 ) 2 + ( 1715 . 95 ) 2+.. .+ ( 10 .215. 95 ) 2
81 =2. 96
For unequal variances, the test statistics is
t= x
¿
sx
¿
m
ss
2
n + sm
2
n
=20 . 87515 . 95
4 . 442
8 + 2. 962
8
=2. 61
(d) The p-value is calculated for right tailed test as P ( T >t=2 .61 ) =0 . 0102> 0. 01 at 14
degrees of freedom.
(e) Therefore, the null hypothesis will fail to get rejected at 1% level of significance. But, at
5% level of significance, the null hypothesis will get rejected. Therefore, it can be
concluded that average tar yield of side stream smokers was not significantly greater than
main stream smokers at 1% level of significance. But, at 5% level average tar yield of
side stream smokers was greater that of the main stream smokers.
(f) Mann-Whitney U Test (non-parametric) has been performed instead of t-test.
Null hypothesis: Population of smokers of side stream and main stream are equal versus
Alternate hypothesis: Population of smokers of side stream and main stream are not
equal
Significance level = 5%
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
18
Test Statistic Calculation: Mann-Whitney U test statistic is smaller of the following two
values (MacFarland, and Yates, 2016, pp. 103-132).
U1=n1 n2 + n1 ( n1+1 )
2 R1 And U2=n1 n2 + n2 ( n2 +1 )
2 R2
Population 1 Sample
Sample Size = 8
Sum of Ranks = 89
Population 2 Sample
Sample Size = 8
Sum of Ranks = 47
Table 1: Rank of two streams of smokers
Stream Value Rank
Mainstream 10.2 1
Mainstream 13.3 2
Mainstream 15.6 3
Sidestream 15.8 4
Mainstream 16.4 5
Sidestream 16.9 6
Mainstream 17 7
Mainstream 17.2 8
Mainstream 18.5 9
Sidestream 18.8 10
Sidestream 18.9 11
Mainstream 19.4 12
Sidestream 20.7 13
Sidestream 21.6 14
Sidestream 25 15
Sidestream 29.3 16
Intermediate Calculations
Total Sample Size n = 16
T1 Test Statistic = 89
T1 Mean = 68
U1=n1 n2 + n1 ( n1+1 )
2 R1=88+ 89
2 89=11
Document Page
18
U2=n1 n2 + n2 ( n2 +1 )
2 R2=88+ 89
2 47=53
Hence, test statistic U = 11
Z Test Statistic =
Un1 n2 /2
n1 n2 ( n1+n2+1 )
12
=
2.205
Two-Tail Test
Lower Critical Value = -1.96
Upper Critical Value = 1.96
The p-Value = P (Z > 2.205) = 0.0274
Decision: The null hypothesis is rejected at 5% level of significance, signifying that tar
yield in both the populations are not same at 5% level of significance.
(g) At 5% level of significance, there is no difference between the decision taken in t-test
and Mann-Whitney U test. Sample size was considerably less than 30 (where CLT can
be applied to infer normality), and therefore, non-parametric test result should be
considered for a better decision.
chevron_up_icon
1 out of 19
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]