Advanced Epidemiology and Biostatistics

Verified

Added on 2023/01/18

AI Summary

This document discusses topics related to advanced epidemiology and biostatistics, including population parameters and sample statistics, descriptive and inferential statistics, variables in a study, correlation coefficients, hypothesis testing, and statistical tests. It also provides frequency tables for age categories and deaths in hospitals.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Running head: ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 1
Advanced Epidemiology and Biostatistics
Name
Institution

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 2
NUR 627: Advanced Epidemiology and Biostatistics
SECTION 1
1. Differences between population parameters and sample statistics
A population is most times referred to as the universe. It is the whole lot of subjects that a
researcher is interested in carrying out his/her experiments. Since sometimes it is not humanly
possible to carry out a study of all the population, the researcher results to studying a fraction of
the population that is known as a sample.
A population parameter, therefore, is a descriptive measure/index of the population. For instance,
the mean score of all the A-level students in the United States.
A sample statistic, on the other hand, is a statistical index that describes the sample.
For instance, the mean score of students of class 8 students in one school in the United States.
2. Why descriptive statistics is considered before inferential statistics
Descriptive statistics are important in describing and organization of the data at hand.
They help the researcher to create summaries and show the patterns in the dataset. This is
important in the showing of any outliers, missing values and other mistakes that could have been
incurred during data entry and data collection. This will enable the researcher to correct them
before further analysis. This will prevent incorrect data even before in-depth analysis is carried
out, which is the goal of descriptive statistics.
3. Imagine you are to conduct a study on how weight and age group (18-35, 36-53, and
=>54 years) relate to systolic blood pressure.
a. What are the variables in this study?

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 3
A variable is a specific characteristic in a study that a researcher is interested in
generalizing on. This could be from a population or a sample.
In this scenario, there exist three variables; Weight, Age group and systolic pressure.
b. What is the exposure variable? what are its type and measurement?
An exposure variable is also known as the predictor variables. These are the variables that
affect the changes that occur to the predicted variable.
In this case, Age group and weight are the predictor variables. They are both quantitative
variables. Age and weight are continuous variables. They both interval/ratio scales
c. What is the outcome variable? what are its type and measurement?
An outcome variable is also known as the dependent variable. This is influenced by the changes
in the independent variables. In this case, it is the systolic pressure. It is also quantitative in
nature. Pressure has an interval scale.
4. A ratio variable:
a. Income- I think income is both countable and divisible hence qualify to be a ratio
variable.
5. True or false:
a. An instrument can be reliable without being valid-FALSE
b. An instrument can be valid without being reliable-TRUE
An instrument that is reliable has been tested and proven to be efficient and can be trusted. This,
therefore, suggests that it is impossible to have an instrument that is reliable without being valid.

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 4
6. Which of these charts allows a researcher to examine a possible relationship
between two ratio variables
Scatter plot-A scatter plot to show how a point in the independent and dependent variables
are clustered. These points tend to cluster in a certain direction. This is, therefore, a pointer of the
kind of relationship between the variables and also the strength of the relationship.
7. What is the mode in of the following data:
The mode is the most common value in a data set.
a. 120 114 116 117 114 121 124
In part a, 114 appears twice while the rest of the values appear once. This, therefore, makes it
the mode.
b. 117 120 114 116 117 114 121 124
In part b, 114 and 117 appears twice while the rest of the values appear once. This, therefore,
makes 114 and 117 to be the most common values.
8. What is the median of the following data:
a. 120 114 116 117 114 121 124
Median is the value at the middle in a dataset when arranged in ascending order.
Therefore arranging this in ascending order it becomes;
114,114,116,117,120,121,124
Counting 3 values from the left and right, 117 remains at the middle. This is, therefore, the
median. It is easily seen since this is an odd number sequence.
b. 117 120 114 116 117 114 121 124
Median is the value at the middle in a dataset when arranged in ascending order.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 5
Therefore arranging this in ascending order it becomes;
114,114,116,117,117,120,121,124
This is an even number sequence hence will have two values in the middle. Counting 3 values
from the left and right, 117 and 117 remains at the middle. To the median, we need to sum them
and divide by 2. Therefore, 117+117 =234
Dividing this by 2, 234/2 =117 which is the median
9. The 95% confidence interval of sodium content level in 32 nursing home patients is
4,250 mg/day and 4750 mg/day. What does this confidence interval tell us?
A confidence interval can be referred to as a probability that a given observed statistic
belongs to a given population where the parameter of the population is being observed
(Mehmetoglu and Jakobsen, 2016).
Confidence takes the actual number of standard deviations which shows the end-limits of the
data and the rejection region
For the example above the confidence interval is [4,250 mg/day, 4,750 mg/day]
From the study done from 32 nursing homes, it shows that the sodium content levels in patients
and take values between 4,250mg and 4,750 mg per day.
10. Which of the following is not a measure of central tendency:
Inter-quartile range- is a measure of dispersion and not central tendency which consist of
mean, mode, and median.
11. The purpose of a study is to test the effect of pressure ulcer prevention in reducing the
incidence of pressure ulcer in critically ill patients in intensive care units.
a. What is the null hypothesis?

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 6
This is a hypothesis of no difference given by H0 (Napier & Maisel, 2015).
Null hypothesis (H0): Pressure ulcer prevention has no significant influence in reducing the
incidence of pressure ulcer in critically ill patients in intensive care units
b. What is the alternative hypothesis?
This is the researcher’s hypothesis that tests his or her question. It is usually the opposite of the
null hypothesis.
Alternative hypothesis (Ha): Pressure ulcer prevention has a significant influence in reducing
the incidence of pressure ulcer in critically ill patients in intensive care units
c. What are the exposure variable and its level of measurement?
Pressure ulcer prevention
This is a categorical variable
d. What is the outcome variable and its level of measurement?
The incidence of pressure ulcer
This is a categorical variable
e. The finding was as follows: the intervention group had less pressure ulcer than
the control group (p=0.005). What is the status of the null hypothesis based on this
result?
Since p=.005 it implies the probability value is less than the alpha=0.05 hence there is
enough statistical evidence to reject the null hypothesis. This implies that the alternative
hypothesis is accepted (Johnson and Bhattacharyya, 2018).
12. A type I error is made when:

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 7
The true null hypothesis is rejected-Type 1 occurs when we reject the null hypothesis
when it is actually true
13. A type II error is made when:
The false null hypothesis is not rejected-Type 11 error occurs when we fail to reject the
null hypothesis when it is actually false (King'oriah, 2012).
14. True or false:
Normality is assumed with all parametric statistical tests; therefore, it is important to check if
the data is normally distributed or not-TRUE
15. Which of the following correlation coefficients represents the strongest relationship?
0.82 is near +1 which shows a strong positive correlation which implies that the variables are
strongly related.
16. True or false:
If a correlation coefficient is -1.0, it means that the two variables will move in opposite
directions-TRUE
17. What is the level of measurement for the following variables: age in years, income
group, and blood type?
Age in years: ____Interval_____________
Income group [low, medium, high]: _________Ordinal___________________
Blood Type: _Nominal_____________________
18. What type of statistics (mention all possible statistics) can be used to describe the
variables: age in years, income groups, and blood type?
Age in Years: __Mean, Mode, Median, standard deviation_______________

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 8
Income group [low, medium, high]: ________Median, range, IQR______________
Blood Type: _________Mode_____________
19. Does a set of scores with most of its values below the mean have a negatively or
positively skewed distribution? Provide a rationale for your answer.
Positive Skewness-this shows that the data has a longer tail on the right. More data point is
on the right side of the distribution.
20. T-statistics = -7.9 and p-value=0.005 describe the difference between women and men
for mental health score.
a. If alpha is set to 0.05, is the p-value of 0.005 statistically significant?
Yes, this shows a statistically significant result.
b. In a sentence, interpret the p-value of 0.005
Since the probability value is less than the alpha value, the null hypothesis is rejected. This
implies that the researcher’s hypothesis is true hence the results are said to be statistically
significant.
21. A study found that the Pearson correlation coefficient “r” value for the relationship
between serum level of cholesterol and the age of the patients in years is 0.77 and p-value
was 0.002.
a. Interpret the “r” value of 0.77 [strength and direction] and provide a rationale for
your answer.
An r value of 0.77 shows that the data has a strong positive relationship. (Levine, et.
al.2014).This implies that when the variable increases, the other variable increase with almost
similar magnitude.

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 9
b. If alpha is set to be 0.05, is this “r” value of 0.77 and p-value of 0.002 statistically
significant?
At alpha=0.05, the probability value p=.002<.05hence statistically significant.
22. What is the statistical test (procedure) that is used to determine whether a significant
difference exists between three or more group means?
A) t-test –A think a t-test is able to show the differences between group means easily
23. What type of hypothesis is represented by the statement “women who smoke are more
likely to have low-birth-weight babies relative to women who do not”?
A) Alternative hypothesis-researchers hypothesis
24. The nurse researcher is calculating the standard deviation. What is the standard deviation?
C) The average amount of deviation of values from the mean and is calculated for every score
25. What is the name for the shape of distribution that occurs when the nurse
researcher has a bell-shaped curve distribution?
A) B) Normal
26. What parametric statistical method(s) a researcher can use to determine if the mean body
mass index of the population is the same for two groups of subjects (group1=diet restriction;
group2=none).
A. The name of the statistical test is ____ANOVA____________
B. The null hypothesis of the statistical test is
The mean body mass index of the population is the same for two groups of subjects (group)
C. The alternative hypothesis of the statistical test is ;
The mean body mass index of the population is the difference for two groups of subjects
(group)
SECTION 2
1. Do frequency for the following variables and interpret the findings: Age (Age category),
dhosp (died in hospital).

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 10
Table 1: Age Category
Frequency Percent
Valid
Percent
Cumulative
Percent
Valid 45-54 47 24.2 24.2 24.2
55-64 73 37.6 37.6 61.9
65-74 53 27.3 27.3 89.2
75+ 21 10.8 10.8 100.0
Total 194 100.0 100.0
According to table 1, the majority of the subjects were 55-64 years (37.6%), 65 to 74
years (53%),45 to 54 years (47%) and above 75 years being the least (21%)
Table 2: Died in hospital
Frequency Percent
Valid
Percent
Cumulative
Percent
Valid No 90 46.4 51.7 51.7
Yes 84 43.3 48.3 100.0
Total 174 89.7 100.0
Missing D.O.A. 20 10.3
Total 194 100.0
Out of the total 194 subjects, the majority were found to have died in other places other
than a hospital (46.4%) with the remaining 43.3% dying in the hospital
2. Do descriptive statistics and histogram with normal distribution and interpret the results
for the following variable: fasting_glucose_level (fasting glucose level)
N
Minimu
m
Maximu
m Mean
Std.
Deviatio
n Skewness
Statistic Statistic Statistic Statistic Statistic Statistic
Std.
Error
fasting
glucose
level
194 75 160 110.39 16.105 .410 .175

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 11
Valid N
(listwise
)
194
From the total number of the subject involved in the study, the mean fasting glucose
level was found to be 110.39 units with a standard deviation of 16.11. The maximum
recorded glucose level was 160 unit with the minimum being 75.
Figure 1: Fasting Glucose level Histogram
The figure above shows a fair bell shape. This implies that the data is normally distributed.
3. Is there a correlation between age (age at admission in years) and fasting_glucose_level
(fasting glucose level)? Report the correlation coefficient (r) [direction and strength] and
interpret the results.
Table 3: Correlation Analysis
Age in
years
fasting
glucose
level
Age in years Pearson
Correlation
1 .444**
Sig. (2-tailed) .000
N 194 194
fasting glucose level Pearson
Correlation
.444** 1
Sig. (2-tailed) .000
N 194 194
**. Correlation is significant at the 0.01 level (2-tailed).

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 12
Table 3 shows that age and fasting glucose level had an r=0.444. This shows that they have a
moderately strong positive relationship between the two variable (Holcomb, 2016). This relation
is significant at alpha=0.05 (p<.05) (Baak, Koopman,& Klous, 2018).
4. Is there a difference between those who died in the hospital and those who did not die in
the hospital [variable name= dhosp (died in hospital)] in the fasting glucose level
a. What statistical test you will use?-ANOVA
b. Is the difference statistically significant? Explain and interpret the findings.
Table 4: Analysis of Variance
Sum of Squares df Mean Square F Sig.
Between
Groups
15572.340 1 15572.340 90.550 .000
Within
Groups
29579.775 172 171.975
Total 45152.115 173
Table 4 show a F(1,172)=90.55,p<.05.This shows that fasting glucose level was
significantly different in an individual that died in hospital and them that never died in
hospital. Since the probability value is less than the alpha level (Gupta, and Kapoor, 2019)
the null hypothesis is rejected hence shows that the fasting glucose level changes
significantly with the kind of death of an individual.
5. Is there a difference between the Age groups [variable name= Agecat] in the following
variable: fasting glucose level (fasting_glucose_level)
a. What statistical test you will use? ANOVA
b. Is the difference statistically significant? Explain and interpret the findings.
Table 5: Analysis of Variance 2
Sum of
Squares df Mean Square F Sig.
Between
Groups
15466.188 3 5155.396 28.315 .000
Within 34594.039 190 182.074

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 13
Groups
Total 50060.227 193
Table 5 shows a F(3,190) =28.32,p<.05.This shows that fasting glucose level was
significantly different between the individual age group. Since the probability value is less
than the alpha level, the null hypothesis is rejected hence shows that the fasting glucose
level changes significantly with the age group of patients.
Summary Report
The data analysis done on the data collected on the age category of the respondents revealed
that a total of 194 individual were involved in the experiment. The descriptive analysis was
analyzed in all the variables to ensure that were correct and no outliers therein.The age category
revealed that the majority of the subjects were 55-64 years (37.6%). This was seconded by 65 to
74 years (53%), 45 to 54 years (47%) and above 75 years being the least (21%). This shows the
sample involved old individuals above 545 years.
The details on patients who died in hospital revealed that Out of the total 194 subjects, the
majority were found to have died on arrival other than a hospital (46.4%) with the remaining
43.3% dying in the hospital.
From the total number of the subject involved in the study, the mean fasting glucose level
was found to be 110.39 units with a standard deviation of 16.11The maximum recorded glucose
level was 160 unit with the minimum being 75. Also, the variable was found to be positively
skewed implying that a large distribution is skewed toward the right (Cox, 2018).
The histogram of fasting glucose level revealed that the data has a fair bell shape (Coolican,
2017). This implies that the data is normally distributed.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 14
To check the relationship between age and fasting glucose level revealed an r=0.444. This
shows that they have a moderately strong positive relationship between the two variable. This
relation was found to be significant at alpha=0.05 (p<.05).
An analysis of variance was done to show if fasting glucose level was significantly
different in an individual that died in hospital and them that had died on arrival. This
resulted to a F(1,172)=90.55,p<.05 (Rosner, 2015). This shows that since the probability
value is less than the alpha level, the null hypothesis is rejected hence shows that the fasting
glucose level changes significantly with the kind of death of an individual.
Similarly, an analysis was done to find out if fasting glucose level was significantly
different between the individual age group.The results show a F(3,190) =28.32,p<.05.
( Field, 2009). This shows that the fasting glucose level changes significantly with the age
group of patients.

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 15
References
Coolican, H. (2017). Research methods and statistics in psychology. Psychology Press.
Cox, D. R. (2018). Applied statistics-principles and examples. Routledge.
Field, Andy. (2009), Discovering statistics using SPSS. 3rd ed. London: Sage Publications Ltd.
George, D.(2011). SPSS for windows step by step: A simple study guide and reference, 17.0
update, 10/e. Pearson Education India.
Gupta, S.C. and Kapoor, V.K.(2019). Fundamentals of applied statistics. Sulthan Chand & Sons.
Holcomb, Z.C.(2016). Fundamentals of descriptive statistics. Routledge.
Johnson, R.A. and Bhattacharyya, G.K. (2018). Statistics: principles and methods. Wiley.
King'oriah, G. K.(2013), Fundamentals of applied statistics. Nairobi: The Jomo Kenyatta
Foundation.
Levine, David M., David Stephan, (et. al.) (2014), Statistics for Managers. Prentice Hall of
India New Delhi.
Mehmetoglu, M. and Jakobsen, T.G.(2016). Applied statistics using Stata: a guide for the social
sciences. Sage.
Napier, C., & Maisel, J. W.(2015). Principles and Procedures of Statistics: a Biometric
Approach. McGraw Hill Book Company, New York.
Rosner, B. (2015). Fundamentals of biostatistics. Nelson Education.
Snedecor, George W. and William G. Cochran.(2013). Statistical Methods. Lowa University
Press, Ames, Iowa. (U.S.A.)

ADVANCED EPIDEMIOLOGY AND BIOSTATISTICS 16
Baak, M., Koopman, R., & Klous, S. (2018). A new correlation coefficient between categorical,
ordinal and interval variables with Pearson characteristics. arXiv preprint
arXiv:1811.11440.