Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

Unlock your academic potential

© 2024 | Zucol Services PVT LTD | All rights reserved.

Added on 2019/11/19

|10

|2577

|178

Report

AI Summary

The study investigates whether there is a significant difference in the proportion of young adult males and females with myopia. The results show that the proportion of female participants (0.5806) is slightly higher than that of male participants (0.5767), but the difference is not statistically significant (p-value = 0.9404). Additionally, the study examines whether there is an association between myopia status and highest education level achieved in young adult Australians. The results show a significant association between the two variables (Chi-squared test p-value < 0.05).

Your contribution can guide someone’s learning journey. Share your
documents today.

Running Head: INTRODUCTION TO BIOSTATISTICS 1

Introduction to Biostatistics

Course code: 401077

Name

Institution

Instructor

Spring 2017

Date

Introduction to Biostatistics

Course code: 401077

Name

Institution

Instructor

Spring 2017

Date

Need help grading? Try our AI Grader for instant feedback on your assignments.

INTRODUCTION TO BIOSTATISTICS 2

Question 1 (10 marks)

Research question: Does the average hours minutes of moderate to vigorous physical activity

(MVPA) differ by gender in young adult Australians?

Use the assignment data set assigned to you: Variables to analyse: ‘MVPA’ and ‘sex’.

Note: Each student will get different answers as the data sets differ.

a) Using R Commander draw histograms of MVPA by sex. Add reasonable axis labels.

(1 mark)

Solution

Note that MVPA has a strong positive skew. Possible responses include:

i/ Use a parametric approach (as the sample size is large enough for the Central

Limit Theorem to apply) or

ii/ Use a non-parametric approach.

b) Address the research question applying option i/ above Please use R Commander to

do all calculations but format your answer following the 5 step method. (6 marks)

Solution

Question 1 (10 marks)

Research question: Does the average hours minutes of moderate to vigorous physical activity

(MVPA) differ by gender in young adult Australians?

Use the assignment data set assigned to you: Variables to analyse: ‘MVPA’ and ‘sex’.

Note: Each student will get different answers as the data sets differ.

a) Using R Commander draw histograms of MVPA by sex. Add reasonable axis labels.

(1 mark)

Solution

Note that MVPA has a strong positive skew. Possible responses include:

i/ Use a parametric approach (as the sample size is large enough for the Central

Limit Theorem to apply) or

ii/ Use a non-parametric approach.

b) Address the research question applying option i/ above Please use R Commander to

do all calculations but format your answer following the 5 step method. (6 marks)

Solution

INTRODUCTION TO BIOSTATISTICS 3

STATE: We will test the claim that the average hours minutes of moderate to

vigorous physical activity (MVPA) differ by gender in young adult Australians.

FORMULATE: We will test the following hypotheses at 5% significance level

An independent t-test will be used

Differences: Male – Female = μd

H0 : μd =0

H1 : μd ≠ 0

α =0.05

SOLVE: We first check the requirements. Assume that tablets were randomly selected

for testing. The sample is large (n = 349 > 30) hence assumption of normality was

made and a parametric test was performed;

We performed an independent t-test

The following R output has been obtained:

Table 1: Independent T-Test and CI: Male, Female

DECISION:

From the output in Table 1, the t-test statistic is 1.9738; it follows a t distribution with

n – 1 = 349 -2 = 347 degrees of freedom. The corresponding P-value is 0.04941.

Since the P-value = 0.04941 < 0.05, H0 MUST BE Rejected.

> t.test(MVPA~sex, alternative='two.sided', conf.level=.95, var.equal=FALSE,

+ data=shortsight)

Welch Two Sample t-test

data: MVPA by sex

t = 1.9738, df = 273.492, p-value = 0.04941

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

0.00227584 1.74477212

sample estimates:

mean in group male mean in group female

4.585890 3.712366

STATE: We will test the claim that the average hours minutes of moderate to

vigorous physical activity (MVPA) differ by gender in young adult Australians.

FORMULATE: We will test the following hypotheses at 5% significance level

An independent t-test will be used

Differences: Male – Female = μd

H0 : μd =0

H1 : μd ≠ 0

α =0.05

SOLVE: We first check the requirements. Assume that tablets were randomly selected

for testing. The sample is large (n = 349 > 30) hence assumption of normality was

made and a parametric test was performed;

We performed an independent t-test

The following R output has been obtained:

Table 1: Independent T-Test and CI: Male, Female

DECISION:

From the output in Table 1, the t-test statistic is 1.9738; it follows a t distribution with

n – 1 = 349 -2 = 347 degrees of freedom. The corresponding P-value is 0.04941.

Since the P-value = 0.04941 < 0.05, H0 MUST BE Rejected.

> t.test(MVPA~sex, alternative='two.sided', conf.level=.95, var.equal=FALSE,

+ data=shortsight)

Welch Two Sample t-test

data: MVPA by sex

t = 1.9738, df = 273.492, p-value = 0.04941

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

0.00227584 1.74477212

sample estimates:

mean in group male mean in group female

4.585890 3.712366

INTRODUCTION TO BIOSTATISTICS 4

CONCLUDE: At 5% significance level, there is enough statistical evidence to

conclude that the average hours minutes of moderate to vigorous physical activity

(MVPA) differ by gender in young adult Australians (The average for the males (M =

4.59) is higher than that of the females (M = 3.71))

c) Address the research question applying option ii/ above. Please use R Commander to

do all calculations but format your answer following the 5 step method. (3 marks)

Solution

STATE: We will test the claim that the average hours minutes of moderate to

vigorous physical activity (MVPA) differ by gender in young adult Australians.

FORMULATE: We will test the following hypotheses at 5% significance level

Two Sample Wilcoxon test will be used

Differences: Male – Female = μd

H0 : μd =0

H1 : μd ≠ 0

α =0.05

SOLVE: We first check the requirements. Assume that tablets were randomly selected

for testing. The data does not meet normality assumption so non-parametric test was

performed.

We performed a 2 sample Wilcoxon test; The following R output has been obtained:

Table 2: Wilcoxon test: Male, Female

> with(shortsight, tapply(MVPA, sex, median, na.rm=TRUE))

male female

3.1 3.0

> wilcox.test(MVPA ~ sex, alternative="two.sided",

data=shortsight)

Wilcoxon rank sum test with continuity correction

data: MVPA by sex

W = 16155, p-value = 0.2896

alternative hypothesis: true location shift is not equal to 0

CONCLUDE: At 5% significance level, there is enough statistical evidence to

conclude that the average hours minutes of moderate to vigorous physical activity

(MVPA) differ by gender in young adult Australians (The average for the males (M =

4.59) is higher than that of the females (M = 3.71))

c) Address the research question applying option ii/ above. Please use R Commander to

do all calculations but format your answer following the 5 step method. (3 marks)

Solution

STATE: We will test the claim that the average hours minutes of moderate to

vigorous physical activity (MVPA) differ by gender in young adult Australians.

FORMULATE: We will test the following hypotheses at 5% significance level

Two Sample Wilcoxon test will be used

Differences: Male – Female = μd

H0 : μd =0

H1 : μd ≠ 0

α =0.05

SOLVE: We first check the requirements. Assume that tablets were randomly selected

for testing. The data does not meet normality assumption so non-parametric test was

performed.

We performed a 2 sample Wilcoxon test; The following R output has been obtained:

Table 2: Wilcoxon test: Male, Female

> with(shortsight, tapply(MVPA, sex, median, na.rm=TRUE))

male female

3.1 3.0

> wilcox.test(MVPA ~ sex, alternative="two.sided",

data=shortsight)

Wilcoxon rank sum test with continuity correction

data: MVPA by sex

W = 16155, p-value = 0.2896

alternative hypothesis: true location shift is not equal to 0

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

INTRODUCTION TO BIOSTATISTICS 5

DECISION:

From the output in Table 2, the Wilcoxon statistic is 16155; the corresponding P-

value is 0.2896. Since the P-value = 0.2896 > 0.05, H0 cannot be Rejected.

CONCLUDE: At 5% significance level, there is no enough statistical evidence to

conclude that the average hours minutes of moderate to vigorous physical activity

(MVPA) differ by gender in young adult Australians (The median for the males (M =

3.1) is not significantly different from the females (M = 3.0))

Question 2 (3 marks)

Research question: On average, how much heavier is the first born twin than the second born

twin among twins born at full term through vaginal delivery in Australia.

The following table shows birthweight (in grams) of a random sample of 10 Australian sets

of twins born at full term through vaginal delivery.

ID of mother Birthweight of first

born twin (grams)

Birthweight of

second born twin

(grams)

How much heavier

the first born is than

the second (grams)

1 2018 2843 -825

2 3217 2476 741

3 2204 2861 -657

4 1166 2300 -1134

5 2715 2582 133

6 2530 1886 644

7 1802 2004 -202

8 2913 2416 497

9 1917 2399 -482

10 1202 1996 -794

Sample size 10 10 10

mean 2168.4 2376.3 207.9

DECISION:

From the output in Table 2, the Wilcoxon statistic is 16155; the corresponding P-

value is 0.2896. Since the P-value = 0.2896 > 0.05, H0 cannot be Rejected.

CONCLUDE: At 5% significance level, there is no enough statistical evidence to

conclude that the average hours minutes of moderate to vigorous physical activity

(MVPA) differ by gender in young adult Australians (The median for the males (M =

3.1) is not significantly different from the females (M = 3.0))

Question 2 (3 marks)

Research question: On average, how much heavier is the first born twin than the second born

twin among twins born at full term through vaginal delivery in Australia.

The following table shows birthweight (in grams) of a random sample of 10 Australian sets

of twins born at full term through vaginal delivery.

ID of mother Birthweight of first

born twin (grams)

Birthweight of

second born twin

(grams)

How much heavier

the first born is than

the second (grams)

1 2018 2843 -825

2 3217 2476 741

3 2204 2861 -657

4 1166 2300 -1134

5 2715 2582 133

6 2530 1886 644

7 1802 2004 -202

8 2913 2416 497

9 1917 2399 -482

10 1202 1996 -794

Sample size 10 10 10

mean 2168.4 2376.3 207.9

INTRODUCTION TO BIOSTATISTICS 6

standard deviation 686.7 339.4 674.8

The following table shows critical values of the t-distribution to be used when calculating a

95% confidence interval

df= 6 7 8 9 10 11 12 13 14

t= 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145

df= 15 16 17 18 19 20 21 22 23

t= 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069

Use the sample size, mean, standard deviation and t-value provided to calculate a 95%

confidence interval for the mean difference in birthweights in Australian twins. Please

assume that the data are normally distributed conditions of the Central Limit Theorem have

been met. Write your answer to the research question in a sentence. This is a manual

calculation – do not use R Commander – and you need to show your working to get any

marks.

When writing equations you are welcome to use the following simplifications:

x can be written xbar

± can be written +-

A1 can be written A_1

a

b can be written a/b

√a can be written sqrt(a)

Solution

First, we compute Sp, the pooled estimate of the common standard deviation:

Substituting:

Sp= √ ( 10−1 ) 686.72 + ( 10−1 ) 339.42

10+ 10−2 = √293374.6=541.6407

The degrees of freedom (df) = n1+n2-2 = 10+10-2 = 18. From the t-Table t =

2.101. The 95% confidence interval for the difference in mean systolic blood

pressures is:

Substituting:

( 2168.4−2376.3 ) ±2.101(541.6407) √ 1

10 + 1

10

207.9 ±1137.987 (0.4472)

standard deviation 686.7 339.4 674.8

The following table shows critical values of the t-distribution to be used when calculating a

95% confidence interval

df= 6 7 8 9 10 11 12 13 14

t= 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145

df= 15 16 17 18 19 20 21 22 23

t= 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069

Use the sample size, mean, standard deviation and t-value provided to calculate a 95%

confidence interval for the mean difference in birthweights in Australian twins. Please

assume that the data are normally distributed conditions of the Central Limit Theorem have

been met. Write your answer to the research question in a sentence. This is a manual

calculation – do not use R Commander – and you need to show your working to get any

marks.

When writing equations you are welcome to use the following simplifications:

x can be written xbar

± can be written +-

A1 can be written A_1

a

b can be written a/b

√a can be written sqrt(a)

Solution

First, we compute Sp, the pooled estimate of the common standard deviation:

Substituting:

Sp= √ ( 10−1 ) 686.72 + ( 10−1 ) 339.42

10+ 10−2 = √293374.6=541.6407

The degrees of freedom (df) = n1+n2-2 = 10+10-2 = 18. From the t-Table t =

2.101. The 95% confidence interval for the difference in mean systolic blood

pressures is:

Substituting:

( 2168.4−2376.3 ) ±2.101(541.6407) √ 1

10 + 1

10

207.9 ±1137.987 (0.4472)

INTRODUCTION TO BIOSTATISTICS 7

Then simplifying further:

207.9 ±508.9233

So, the 95% confidence interval for the difference is (-301.0233, 716.8233)

Therefore on average, the first born twin is heavier than the second born twin

among twins born at full term through vaginal delivery in Australia by between -

301.0233 and 716.8233.

Question 3 (4 marks)

Research question: How different is the proportion of people with myopia between young

adult females and young adult males in Australia?

Use the assignment data set assigned to you: Variables to analyse: ‘myopia’ and ‘sex’

Note: Each student will get different answers as the data sets differ.

a) Using R Commander, calculate the 95% confidence interval for the difference in

proportion of young adult males and young adult females with myopia. (1 mark)

Solution

The 95% confidence interval for the difference in proportion

of young adult males and young adult females with myopia

is CI. [-0.1078, 0.0999]

b) Carefully write, in words, the answer to the research

question. Be sure to identify which group has the

higher rate of myopia. (2 marks).

Solution

The proportion of young adult females (0.5806) with

myopia is higher than the proportion males with

myopia (0.5767). However, there is no significant

difference in the proportions of the two

groups, p = .9404.

c) Have the assumptions of this confidence interval

been met? Explain why or why not. (1 mark)

Solution

> library(abind,

pos=15)

> local({ .Table <-

xtabs(~sex+myopia

, data=shortsight)

+ cat("\

nPercentage

table:\n")

+

print(rowPercents(

.Table))

+

prop.test(.Table,

alternative='two.si

ded',

conf.level=.95,

correct=FALSE)

+ })

Percentage table:

myopia

sex myopia

normal Total

Count

male 57.7

42.3 100 163

Then simplifying further:

207.9 ±508.9233

So, the 95% confidence interval for the difference is (-301.0233, 716.8233)

Therefore on average, the first born twin is heavier than the second born twin

among twins born at full term through vaginal delivery in Australia by between -

301.0233 and 716.8233.

Question 3 (4 marks)

Research question: How different is the proportion of people with myopia between young

adult females and young adult males in Australia?

Use the assignment data set assigned to you: Variables to analyse: ‘myopia’ and ‘sex’

Note: Each student will get different answers as the data sets differ.

a) Using R Commander, calculate the 95% confidence interval for the difference in

proportion of young adult males and young adult females with myopia. (1 mark)

Solution

The 95% confidence interval for the difference in proportion

of young adult males and young adult females with myopia

is CI. [-0.1078, 0.0999]

b) Carefully write, in words, the answer to the research

question. Be sure to identify which group has the

higher rate of myopia. (2 marks).

Solution

The proportion of young adult females (0.5806) with

myopia is higher than the proportion males with

myopia (0.5767). However, there is no significant

difference in the proportions of the two

groups, p = .9404.

c) Have the assumptions of this confidence interval

been met? Explain why or why not. (1 mark)

Solution

> library(abind,

pos=15)

> local({ .Table <-

xtabs(~sex+myopia

, data=shortsight)

+ cat("\

nPercentage

table:\n")

+

print(rowPercents(

.Table))

+

prop.test(.Table,

alternative='two.si

ded',

conf.level=.95,

correct=FALSE)

+ })

Percentage table:

myopia

sex myopia

normal Total

Count

male 57.7

42.3 100 163

Need help grading? Try our AI Grader for instant feedback on your assignments.

INTRODUCTION TO BIOSTATISTICS 8

Yes the assumptions of this confidence interval have been met. For instance, one of

the assumptions being satisfied: np ≥ 10 and n(1-p) ≥ 10; as could be seen

np=100*0.57=57>10.

Question 4 (8 marks)

Research question: Does the proportion of young adults with myopia differ by highest

education level achieved in young adult Australians?

Use the assignment data set assigned to you: Variables to analyse: ‘myopia’ and ‘educ’

Note: Each student will get different answers as the data sets differ.

a) Show the relationship between myopia status and highest education level achieved

using a two way contingency table. Include either row or column percentages. Obtain

the results using R Commander but then type and label the table yourself with

appropriate description and headings. An R Commander screenshot will not be

accepted. (1 mark)

Solution

Education level Myopia Normal

Count Percent Count Percent

Less 38 18.8% 43 29.3%

Completed Secondary 70 34.7% 57 38.8%

Completed Tertiary 94 46.5% 47 32.0%

Total 202 100.0% 147 100.0%

b) Present the expected frequencies for the above table if the null hypothesis were true.

Obtain the results using R Commander but then type and label the table yourself with

appropriate description and headings. An R Commander screenshot will not be

accepted. (1 mark)

Solution

Expected Counts

Myopia Normal

Yes the assumptions of this confidence interval have been met. For instance, one of

the assumptions being satisfied: np ≥ 10 and n(1-p) ≥ 10; as could be seen

np=100*0.57=57>10.

Question 4 (8 marks)

Research question: Does the proportion of young adults with myopia differ by highest

education level achieved in young adult Australians?

Use the assignment data set assigned to you: Variables to analyse: ‘myopia’ and ‘educ’

Note: Each student will get different answers as the data sets differ.

a) Show the relationship between myopia status and highest education level achieved

using a two way contingency table. Include either row or column percentages. Obtain

the results using R Commander but then type and label the table yourself with

appropriate description and headings. An R Commander screenshot will not be

accepted. (1 mark)

Solution

Education level Myopia Normal

Count Percent Count Percent

Less 38 18.8% 43 29.3%

Completed Secondary 70 34.7% 57 38.8%

Completed Tertiary 94 46.5% 47 32.0%

Total 202 100.0% 147 100.0%

b) Present the expected frequencies for the above table if the null hypothesis were true.

Obtain the results using R Commander but then type and label the table yourself with

appropriate description and headings. An R Commander screenshot will not be

accepted. (1 mark)

Solution

Expected Counts

Myopia Normal

INTRODUCTION TO BIOSTATISTICS 9

Less 46.8825 34.1175

Completed Secondary 73.5072 53.4928

Completed Tertiary 81.6103 59.3897

c) Are the requirements for a Chi-square ( χ2 ) test of independence met? Explain why or

why not. (1 mark)

Solution

Yes the requirements of the Chi-Square are met. For instance, the variables are

independent of each other and also no expected values < 5

d) Irrespective of your answer in part c) address the research question using a Chi-square

test on the provided data. Please use R Commander for all calculations but format

your answer following the 5 step method. (5 marks)

Solution

STATE: We will test the claim that there is significant association between myopia

status and highest education level.

FORMULATE: We will test the following hypotheses at 5% significance level

An independent t-test will be used

H0 :Thereis no association between mypoia∧education level

H1 :There is association between mypoia∧educationlevel

α =0.05

SOLVE: We first check the requirements. The expected values > 5 and the variables

are also independent of each other;

We performed a Chi-Square test of independence

The following R output has been obtained:

Pearson's Chi-squared test

data: .Table

X-squared =

8.8584, df = 2, p-

value = 0.01192

Less 46.8825 34.1175

Completed Secondary 73.5072 53.4928

Completed Tertiary 81.6103 59.3897

c) Are the requirements for a Chi-square ( χ2 ) test of independence met? Explain why or

why not. (1 mark)

Solution

Yes the requirements of the Chi-Square are met. For instance, the variables are

independent of each other and also no expected values < 5

d) Irrespective of your answer in part c) address the research question using a Chi-square

test on the provided data. Please use R Commander for all calculations but format

your answer following the 5 step method. (5 marks)

Solution

STATE: We will test the claim that there is significant association between myopia

status and highest education level.

FORMULATE: We will test the following hypotheses at 5% significance level

An independent t-test will be used

H0 :Thereis no association between mypoia∧education level

H1 :There is association between mypoia∧educationlevel

α =0.05

SOLVE: We first check the requirements. The expected values > 5 and the variables

are also independent of each other;

We performed a Chi-Square test of independence

The following R output has been obtained:

Pearson's Chi-squared test

data: .Table

X-squared =

8.8584, df = 2, p-

value = 0.01192

INTRODUCTION TO BIOSTATISTICS

10

DECISION:

From the above output , the Chi-Square statistic is 8.8584; it follows a Chi-Square

distribution with 2 degrees of freedom. The corresponding P-value is 0.01192. Since

the P-value = 0.01192 < 0.05, H0 MUST BE Rejected.

CONCLUDE: At 5% significance level, there is enough statistical evidence to

conclude that there is significant association (relationship) between myopia and

highest education level

Question 5 (5 marks)

Suppose the natural sleep cycle for Australians is normally distributed with a mean length of

8 hours with a standard deviation of 0.5 hours. Supposed a researcher wishes to test the

hypothesis that the natural sleep cycle for Australian men is longer than for Australian

women. The minimum difference in mean sleep cycle length which they are interested in

detecting is 8.1 hours for males against 7.9 hours for females.

a) What is the minimum sample size required to detect a difference of this size with

α=0.05and power =0.90 ( β=0.10 ). Present your answer as a sentence which

summarises the required sample size to achieve what power subject to what

conditions. (3 marks)

Solution

The minimum sample size required would be 263; this is based on a power of 0.9 and

the significance level is 0.05 and the minimum difference in the mean sleep between

the males and females should be 0.2 (i.e. d = 0.2)

b) Suppose the researcher could not afford such a large sample size. Suggest two

changes which they could make to their research question or study design which

would reduce the required sample size. (2 marks)

Solution

The researcher would do either of the following;

Reduce the power from 0.9 to say 0.8

Increase the minimum difference in mean sleep to be tested

The above two changes will reduce the required sample size

10

DECISION:

From the above output , the Chi-Square statistic is 8.8584; it follows a Chi-Square

distribution with 2 degrees of freedom. The corresponding P-value is 0.01192. Since

the P-value = 0.01192 < 0.05, H0 MUST BE Rejected.

CONCLUDE: At 5% significance level, there is enough statistical evidence to

conclude that there is significant association (relationship) between myopia and

highest education level

Question 5 (5 marks)

Suppose the natural sleep cycle for Australians is normally distributed with a mean length of

8 hours with a standard deviation of 0.5 hours. Supposed a researcher wishes to test the

hypothesis that the natural sleep cycle for Australian men is longer than for Australian

women. The minimum difference in mean sleep cycle length which they are interested in

detecting is 8.1 hours for males against 7.9 hours for females.

a) What is the minimum sample size required to detect a difference of this size with

α=0.05and power =0.90 ( β=0.10 ). Present your answer as a sentence which

summarises the required sample size to achieve what power subject to what

conditions. (3 marks)

Solution

The minimum sample size required would be 263; this is based on a power of 0.9 and

the significance level is 0.05 and the minimum difference in the mean sleep between

the males and females should be 0.2 (i.e. d = 0.2)

b) Suppose the researcher could not afford such a large sample size. Suggest two

changes which they could make to their research question or study design which

would reduce the required sample size. (2 marks)

Solution

The researcher would do either of the following;

Reduce the power from 0.9 to say 0.8

Increase the minimum difference in mean sleep to be tested

The above two changes will reduce the required sample size

1 out of 10