Introduction to Biostatistics Assignment 1

Verified

Added on  2023/01/10

|10
|1714
|61
AI Summary
This document provides the solutions to Assignment 1 for the course Introduction to Biostatistics. It includes the identification of categorical variables, graphing and analyzing distributions, tabulating relationships between variables, calculating probabilities, and estimating mean and standard deviation. The document also includes R codes for data analysis.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
401077 Introduction to Biostatistics, Autumn 2019
Assignment 1 (Due Sunday March 31, 2019)
BioStatistics
Student Name:
Instructor Name:
Course Number:
30 March 2019

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question 1 (2 marks)
Consider the example data set on workers in Sydney that you have been assigned for your
assignment. Identify all of the categorical variables in the data set.
Answer
The categorical variables are;
Sex
Married
Education level
Question 2 (4 marks)
a) Using the data set on workers in Sydney assigned to you and R Commander, graph
the distribution of the variable age. (1 mark)
Answer
Document Page
b) Using appropriate statistics from R Commander, write one or two sentences
describing the distribution of age. (Hint: consider measures of centre, spread and
shape. R commander output alone is insufficient – write the answer in your own
words.) (3 marks)
Answer
We computed the mean of the age to be 40.98 with the median age being 40. The
skewness value was found to be 0.11 (a value less than 0.5), which implies that the
data is close to normal distribution. The kurtosis value was obtained to be -1.12, this
value is less than zero implying that the distribution is light tails and is called a
platykurtic distribution.
Question 3 (3 marks)
Document Page
Using the data set on workers in Sydney assigned to you and R Commander, graph the
distribution of the highest educational qualification obtained (‘educ’). Write a sentence or
two summarising the main characteristics of this distribution as shown in the graph. (3 marks)
Answer
Looking at the above chart, majority had certificate which means majority of the respondents
completed a TAFE/technical college certificate. A small proportion of the respondents
completed a postgraduate degree.
Question 4 (4 marks)
Using the data set on workers in Sydney assigned to you and R Commander, graph
respondents self-reported average weekly income (‘income’) against the logarithm of the
income variable (‘log_income’). Write a sentence or two describing the form, direction and
strength of this relationship.
Answer

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
From the above figure, we can see that there is a positive non-linear relationship between
respondents self-reported average weekly income (‘income’) and the logarithm of the income
variable (‘log_income’).
Question 5 (5 marks)
a) Using the data set on workers in Sydney assigned to you and R Commander, tabulate
the relationship between gender (‘sex’) and highest education qualification achieved
(‘educ’). Include frequency counts and row or column percentages. (Hint: R
commander output alone is insufficient present your table(s) in Word with
informative headings.) (1 mark)
Answer
Education
Level
Sex
Male Female
Frequency (n) Percent (%) Frequency (n) Percent (%)
Post Graduate 27 10.00 36 15.79
Bachelor 85 31.48 81 35.53
Certificate 127 47.04 49 21.49
Not Tertiary 31 11.48 62 27.19
Document Page
b) Using the results in part a) write a sentence or two describing the relationship between
gender (‘sex’) and highest education qualification achieved (‘educ’). (2 mark)
Answer
The results in part a) shows that majority of female respondents (35.53%, n = 81)
have bachelor degree as their highest level of education while majority of male
respondents (47.04%, n = 127) completed a TAFE/technical college certificate as
their highest education level. Both male (10%, n= 27) and female (15.79%, n = 36)
respondents had post graduate as the least completed level of education.
c) If you were to select one person at random from this data set, what is the probability
they would be a male with a post graduate qualification? (1 mark)
Answer
The probability of picking a male with a post graduate qualification is 0.1 (10%).
d) If you were to choose one male at random from this data set what is the probability
they would have no tertiary qualification? (1 mark)
Answer
The probability of picking one a male with no tertiary qualification is 0.1148
(11.48%).
Question 6 (5 marks)
People with blood type O- are sometimes referred to universal donors as their blood can be
used for all recipients. About 9% of the Australian population have blood type 0-. Suppose
blood donors have the same distribution of blood types as the Australian population.
a) In any 10 donors chosen at random, what is the probability that exactly 3 will have O-
blood? (1 mark)
Answer
P ( x=x )= n!
( nx ) ! x ! px ( 1 p )n x
P ( x=3 ) = 10 !
( 103 ) !3 ! 0.093 ( 10.09 ) 103=0.045206
Document Page
b) In any 10 donors chosen at random, what is the probability that between 1 and 4
inclusive will be of blood type O-? Show any working. (1 mark)
Answer
P ( x=x ) = n!
( nx ) ! x ! px ( 1 p ) n x
P ( x=1 )= 10!
( 101 ) !1! 0.091 ( 10.09 )101=0.385137
P ( x=2 ) = 10 !
( 102 ) !2 ! 0.092 ( 10.09 ) 102 =0.171407
P ( x=3 )= 10 !
( 103 ) !3 ! 0.093 ( 10.09 )103=0.045206
P ( x=4 ) = 10 !
( 104 ) ! 4 ! 0.094 ( 10.09 ) 104 =0.007824
Thus the required probability is;
P ( 1 x 4 )=0.385137+ 0.171407+0.045206+0.007824=0.609574
c) In any 10 donors chosen at random, what is the most likely frequency of blood type O-?
Provide some evidence to support your answer. (2 marks)
Answer
The most likely frequency of blood type O- in 10 randomly chosen donors is
approximately 1.
P ¿
9 %10=0.9 1
d) On average, how many of the next 10 donors would you expect to be O-? Show any
working. (Hint: average is another word for mean). (1 mark)
Answer
E ( X ) =np
¿ 100.09
¿ 0.9
Question 7 (7 marks)
Suppose ‘at rest’ body temperature for healthy adults is Normally distributed with mean
μ=36.80oC and standard deviation σ =0.41oC.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
a) Estimate the percentage of adults who’s ‘at rest’ body temperature falls below 36.80 oC.
(1 mark)
Answer
P( X<36.80)
Z= x μ
σ =36.8036.80
0.41 =0
P ( Z <0 )=0.5=50 %
b) Olaf’s at rest body temperature is 33.00 oC and Timu’s at rest body temperature is 38.00
oC. Both are healthy adults. Present Z-scores for Olaf and Timu’s body temperature and
then write a sentence explaining whose body temperature is most unusual. (2 mark)
Answer
Z-scores for Olaf
Z= x μ
σ =33.0036.80
0.41 =3.8
0.41 =9.2683
Z-scores for Timu
Z= x μ
σ =3836.80
0.41 = 1.2
0.41 =2.92683
The body temperature for Olaf is the most unusual since it has the largest z score value.
c) Suppose we had 18 sports teams each consisting of 12 team members all of whom are
adults. Suppose that for each of the 18 teams, we measured the at rest body temperature
of all 12 team members and then calculated from these the mean at rest body temperature
for each team. That is, we calculate 18 team means. Estimate the mean and standard
deviation of the distribution of the team means. (2 marks)
Answer
Assuming the data for the 18 teams is;
36.8, 36,36.5,36.5,35.5,36,36,36.5,36.5,36.8,36,35,34.9,35,37,36,8,36.5
Mean= xi
n = 36.8+36++ 36.8+36.5
18 = 650.3
18 =36.13
Standard deviation= ( xix )2
n1 = ( 36.836.13 )2+ + ( 36.536.13 )2
181 = 7.336111
17 = 0.431536=0.65
d) Estimate the median and first quartile (25 percentile) of the distribution of team means
described in c). (2 marks)
Answer
Median= 9 th+10 th
2 =36 +36.5
2 =72.5
2 =36.25
Document Page
1 st Quartile=5 th value=36.00
Appendix
R codes
install.packages("psych")
library(psych)
data<-load("C:\\Users\\310187796\\Documents\\datafor19789170.Rdata")
str(workhours)
attach(workhours)
boxplot(age, main="Boxplot of age")
describe(age)
counts <- table(educ)
barplot(counts, xlab="Education level",ylab="Frequency",col="blue", main="Highest
education chart",
Document Page
border="red")
plot(log_income, income, main = "Plot of Income against log Income",
xlab = "Log Income", ylab = "Income",
pch = 19)
abline(lm(income ~ log_income, data = workhours), col = "blue")
library(ggplot2)
ggplot(workhours, aes(x=log_income, y=income)) +
geom_point() +
geom_smooth(method=lm)
mytable <- table(educ,sex) # A will be rows, B will be columns
mytable # print table
prop.table(mytable, 2) # column percentages
1 out of 10
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]