Autumn 2019 Biostatistics Assignment 3: STROBE Analysis Report

Verified

Added on  2023/03/30

|7
|1276
|223
Homework Assignment
AI Summary
This assignment is a critical appraisal of the statistical methods used in the Weston et al. (2019) paper, focusing on how well the authors documented their statistical approaches against the STROBE checklist. The analysis begins by specifying the study's design, data sources, and sampling methods, including stratified random sampling and the use of a case-control study design. The study employed biostatistical methods, such as imputation and regression, to examine the relationship between long work hours, weekend work, and depressive symptoms in men and women. The assignment highlights the interpretation of a multiple linear regression model, including the model's coefficients, confidence intervals, and the results of an F-test. The report also discusses the model's R-squared value, the correlation between self-reported work hours and age, and predictions for self-reported work hours based on gender and age. The student provides a detailed review of the statistical reporting, identifying strengths and weaknesses in the application of the STROBE guidelines and offers suggestions for improvements. The assignment concludes with a discussion of the study's findings and their implications.
Document Page
Question 1
The size of the population to be studied should be specified in advance before the actual study
begins. The study that was conducted was concise and in line to the expected preconditions of
strengthening the reporting of observational studies in epidemiology. The research involved
specifying the sources of the data and the data that was used for the study was secondary data
since it had been recorded by another organization. The researchers then used stratified random
sampling where they stratified the population of study by gender that is, the strata were male and
female. From each, they used simple random sampling to select the samples. The study design
that was used during the research was a case-control study. The samples were followed during
the study period so that the objective of the study could be achieved.
Every research that is conducted entails the use of various statistical methods. The researchers
suggests that biostatistics method should be applied in the biological field including
epidemiology. The research that was conducted by Weston and his colleagues involved using
imputation methods and regression method. These methods aid in making a conclusion that
enables those in to charge make decisions that may have a positive impact. From the utilization
of those methods, the researchers were able to find out that depression was associated more to
the male who worked more on weekends than females who worked the same period.
One major challenge associated with regression methods is variability that is not taken care of by
the explanatory variables used to fit the model. Further, the ordinary least squares methods that
are used for calculating estimates of the coefficients of the explanatory variables is not
appropriate when the number of predictors is greater than the number of observations. Other
methods like shrinkage methods should be adopted for the study. Despite that, for heuristic
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
purpose and delivering the results to the audience the linear regression methods were appropriate
for the study as suggested in STROBE
The conclusion that was drawn from the results was based on the sample that was used. This
implied that for the more concrete conclusion from the sample inferential statistics had to be
performed. The sampling method that was used ensures that there is no bias in the conclusions
that are made. In the future, the researchers should consider using various methods that are used
in hypothesis testing to calculate the sample size. The objective of using statistical methods in
selecting the sample is to ensure that the sample that is obtained is optimum for the population of
interest. The researchers were biased when correcting the sample since there were more females
than males in the study. This makes the audience wonder why could it be the converse or what
made the samples not to be of equal size.
Question 2
The table below represents descriptive statistics for age and self-reported working hours based
on gender.
gender variabl
e
N Mean minimum maximum median skewnes
s
Standard
deviation
Male Work 283 41.87 29 57 42 0.06 5.14
Female Work 215 36.57 26 49 37 0.06 4.61
The table indicated that there were more males than the females who gave their responses. Male
reported higher working hours than female. Their standard deviation were 5.14 and 4.61
respectively which indicated that there was great variability in the number of working hours
Document Page
reported by both male and female. The coefficient of skewness for male was 0.06 which was
equal to that of female and was close to zero that indicated that the working hours assumed a
symmetric normal distribution (Huxley et al, 2011).
The histogram above also indicated that the self-reported working hours had a symmetric normal
distribution. This implied that parametric test were appropriate for the data.
To investigate how the respondents' self-reported work hours differed between male and female
a multiple linear regression model was fitted. The self-reported work hours by the respondent
was the response variable while the predictor variables were the age and sex of the respondents.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.38689 0.73786 56.091 <2e-16 ***
age 0.01170 0.01643 0.712 0.477
sexfemale -5.30050 0.44542 -11.900 <2e-16 ***
From the above table, the fitted model was the following,
Document Page
work =41.39+0.01 age5.003 sex female
The model had the interpretation that controlling the age of the respondent the self-reported
working hours for female were about 5 times less the self-reported working hours for male
(Mugie et al, 2011). This implied that male had higher self-reported working hours than female.
Coefficient 2.5% 97.5%
Age 0.012 -0.0206 0.044
Sexfemale -5.3 -6.175 -4.4254
The 95% confidence interval for the sex variable was [-6.1756, -4.4254].
This indicated that with 95% certainty the true coefficient of the predictor variable could be
within that range (Von Elm, & Altman, 2014).
To investigate whether the predictor variables were significant in explaining the response
variable an overall F-test was performed at 5% level of significance. The following hypothesis
was formulated for performing the test.
H0: the explanatory variables are insignificant in explaining the model (β0= β1=0)
Versus
Hα: the explanatory variables are significant in explaining the model (β0 ≠ β0 ≠0)
Residual standard error: 4.923 on 495 degrees of freedom
Multiple R-squared: 0.2232, Adjusted R-squared: 0.22
F-statistic: 71.1 on 2 and 495 DF, p-value: < 2.2e-16
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
From the above table, the test had a p-value of 2.2e-16 which was less than 0.05 level of
significance thus the null hypothesis was rejected. Thus the explanatory variables were
statistically significant in explaining the model (Yu, Lu, & Stander, 2003). Finally, the model
had R-squared of 71.1% which indicated that greater variability of the model was explained by
the explanatory variables in the model.
The above plot indicated that there was weak correlation between self-reported working hours
and age based on gender. That implied that an increase in age didn’t have great changes for
working hours. The plot also indicated that male had more self-reported working hours than
female. Further, the relationship between age and work for male was linear while that of female
was non-linear.
The predicted number of self-reported work hours for 25 years old male worker is obtained as
follows.
Document Page
Work= 41.39+0.01(25)-5.003(0) = 41.64. This indicated that the self-reported work hours was
41.64 for the 25-year-old male worker.
For the 25-year-old female worker, the self-reported working hours was wor41.39+0.01(25)-
5.003(1) = 36.637.
The above two predicted self-reported working hours for male was higher than for female who
was of the same age (Williams et al, 2012). This implied that sex of the work had a higher
likelihood of determining the number of self-reported working hours. On, the contrary the age of
the employee was insignificant in determining the worker self-reported working hours.
References.
Huxley, R. R., Filion, K. B., Konety, S., & Alonso, A. (2011). Meta-analysis of cohort and case–
control studies of type 2 diabetes mellitus and risk of atrial fibrillation. The American
journal of cardiology, 108(1), 56-62.
Mugie, S. M., Benninga, M. A., & Di Lorenzo, C. (2011). Epidemiology of constipation in
children and adults: a systematic review. Best practice & research Clinical
gastroenterology, 25(1), 3-18.
Roy, T., & Lloyd, C. E. (2012). Epidemiology of depression and diabetes: a systematic
review. Journal of affective disorders, 142, S8-S21.
Von Elm, E. & Altman, D. G. (2014). The Strengthening the Reporting of Observational Studies
in Epidemiology (STROBE) Statement: guidelines for reporting observational
studies. International journal of surgery, 12(12), 1495-1499.
Document Page
Williams, M. N., Grajales, C. A. G., & Kurkiewicz, D. (2013). Assumptions of multiple
regression: Correcting two misconceptions.
Yu, K., Lu, Z., & Stander, J. (2003). Quantile regression: applications and current research
areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3), 331-
350.
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]