STAT6000 Assignment: Analysis of Statistical Research Articles

Verified

Added on 2022/11/18

AI Summary

This report analyzes two research papers, one focusing on the consequences of alcohol use and the other on the prevalence of diabetes mellitus. The first paper investigates the association between alcohol consumption during celebrations and its negative consequences, utilizing hypothesis testing, Wilcoxon signed-rank tests, correlation tests, and logistic regression models. The study employed clustering sampling and examined demographic characteristics like sex and age. The second paper examines the prevalence of diabetes in a Chinese population using data from household surveys. The study aimed to evaluate the prevalence of self-reported diabetes and identify associated factors using logistic regression. The analysis includes demographic data such as age, sex, and income, and the researchers employed statistical hypothesis testing to draw conclusions about the relationship between the variables. Both papers highlight the importance of statistical analysis in understanding health-related issues and the interpretation of statistical results to support research findings.

Paper 1
1.
Research is done based on various objectives. The aim of the research is to achieve the intended
objective of the study. Additionally, the research objective guides how the entire research is to be
conducted. A researcher aims at testing whether the objective of the study is achieved. In order to
understand whether the research objective is fulfilled statistical hypothesis are tested. Usually, a
null hypothesis is tested against the alternative hypothesis. With sufficient evidence provided by
the sample the null hypothesis is rejected in favor of the alternative hypothesis otherwise the null
hypothesis is accepted. In order, to carry out the test of hypothesis the researcher has to specify
the intended confidence interval, conventionally 95% confidence interval is usually used in
scientific research. The confidence level indicates the certainty which the results are true.
The hypothesis used in this case was the following
H0: there is no association between the consequences of alcohol use and the explanatory
variables
Versus
H1: the is an association between the consequences of alcohol use and the explanatory variables
In this test the dependent variable was reported to experience of the negative consequences of
alcohol use while the independent variable was the duration of time spent drinking alcohol
during celebration.
Below is the second hypothesis that was formulated for the test,
H0: the is no difference in the average of SD during pre-celebration and during the celebration

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Versus
H1: there is a difference in the average of SD during pre-celebration and during the celebration
The dependent variable was the SD during the school leavers’ celebration while the independent
variable was the SD during the pre-celebration of school leavers’.
2.
Clustering sampling method was used during the survey. It involved creating clusters of the
sample to be used in the survey based on the location where the celebration was being held (Lam
et al, 2014). It also ensured that the age of the participants was divided into two clusters namely
those who were 17 years old and below and those who were greater than 18 years old. The
advantages of this method of sampling includes the following the following; its cheap and easy
to implement, it ensures that every cluster in the population is evenly represented by making use
simple random sampling in selecting the units in each cluster, and finally it saves cost which
could have been otherwise used in sampling the whole population. On the contrary, it also has
some disadvantages such as high variability since the samples in each cluster may have great
variability.
The survey was conducted into two phases the first one involved those respondents who intended
to go to the school leavers’ celebration while the second survey was conducted during the post-
celebration.
3.
The demographic characteristics involved were sex, age and the completion time of the survey.
In the first survey, 56% of the participants were female and 91% of the participants were age 17

years and below while 9% were 18 years and above. This indicated that the majority of those
aged in 17 years and below were more interested in attending school leavers’ celebration. It also
indicated that the majority of those who were willing to attend were females. Further, the
average completion time of the survey was 15.61 minutes this meant that anyone who
participated in the survey could take an average of 15.64 minutes to complete the survey. For the
second survey, 50% of the participants were female. This indicated that most females who had
participated in a pre-celebration survey didn’t participate in the post-celebration survey which
leads to the proportion of male and female who participated in the post-celebration survey to be
equal. Additionally, 94% of those who participated in the post-celebration survey were age 17
years and below and only 6% were aged 18 years and above. This meant that most of the school
leavers’ celebration is attended by those aged 17 years and below and the majority of them are
female. In this survey, the average completion time was 15 minutes which was less compared to
that used in the initial survey.
4.
In order to draw a conclusion about the population based on the population, inferential statistics
were performed. The Wilcoxon signed-rank test inference test was performed and involved
determining whether the population sample had the same population. The hypothesis used in
testing this inference is formulated below;
H0: the samples of the two groups involved in the survey had the same population
Versus
H1: the samples of the two groups involved in the survey had a different population

This test was carried out indicating that no assumption was made about the samples. The result
of the sample had a p-value of 0.0008 which was less than the 5% confidence level and thus the
null hypothesis was rejected this lead to the conclusion that the samples of the two groups had
different population distributions.
The other inference test was the correlation test using the spearman Rho test. This is also a non-
parametric test. The test is performed to investigate whether there is a correlation between the
two variables are independent. From the results, the quantity of alcohol consumed was dependent
on the average number of drinking hours.
5.
The logistic regression model that was performed indicated the relationship between the binary
response variable and the predictor variables. The model was represented in terms of the odds
ratio. The odds of engaging in unprotected casual sex was 10.92 higher than those who had safe
strategies which increased the likelihood of negative consequences of those who were involved
in school leavers’ celebration. In this model the researchers controlled for potential confounding
factors that could increase the variability of the model, they included the grade in which the
respondent was when attending the leavers’ celebration. Controlling for the potential
confounding factors lead to the use of protective harm reduction strategies was associated with
lowered odds of experiencing some of the most common harms and risks including hangover,
vomiting, blackouts, and unprotected sex.
6.
The sample is used in this sample doesn’t evenly represent the national population of schoolies.
First, the method of determining the sample size was not indicated such as Cronbach sample size

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

formula. Secondly, the clustering sampling technique is associated with bias thus, the sample
size that was used in the study may suffer from bias and hence not represent the whole
population as required.
Paper 2
1.
Due to the alarming rise in diabetes globally, research on how it can be controlled and prevented
are continuously being carried out. These researches aim at finding a solution to diabetes. Both
government and non-governmental organizations support these researches. In the Wong et al.
study they carried out research to investigate the prevalence of the diabetes mellitus over the
years. Their study was carried out in Hong Kong and Mainland China. The objective of the study
was to evaluate the prevalence of self-reported diabetes by territory-wide household surveys
representative of the whole Hong Kong population. The second objective was to examine factors
independently associated with diabetes.
The objective can be represented in terms of the null and alternative hypothesis which will be
later tested using various inferential statistics test. In terms of statistical hypothesis the first
objective can be written as follows;
H0: the average prevalence of self-reported diabetes is the same across all territories
Versus
H1: the average prevalence of self-reported diabetes is different across territories
The above hypothesis will be used in testing whether the average prevalence of self-reported
diabetes differs across the various territory. The test can be done using the one-way analysis of

variance technique. Once the test had been carried and the conclusion made, it will help in
determining whether the objective of the study is achieved or not.
The second objective of the study can also be written in terms of the statistical hypothesis as
follows;
H0: diabetes is not associated with various independent variables
Versus
H1: diabetes is associated with various independent variables
This test aims at establishing whether various predictor variables such as age and sex of the
person are associated with a person having or not having diabetes mellitus. This test is done by
first fitting a logistic regression model since the dependent variable is binary and then testing
whether the predictor variables are statistically significant in explaining the logistic regression
model. This test will help in determining whether the objective of the study is achieved. It also
helps in the recommendation about future studies.
2.
The demographic characteristics of the people involved in the study were age, sex, and monthly
household income. These demographics were measured on quarterly intervals of every year from
2001-2008. A total of 33609, 29561, 298022, 28923 interviews were successfully conducted in
the years 2001, 2002, 2005, and 2008 respectively. The figures indicated that the number of
people who participated in the survey decreased from the first year the interview was conducted
to the last year the interview was conducted. Among those interviews conducted, 103367 were
adults aged 15 years old and above. This further implied that the majority of those who

participated in the interviews were adults. Their average age was 38.2 which implied that any
person who participated in the interview was likely to be about 38 years old. There were 49.8%
of males who participated in the survey in 49.8% and 491% of the male in 2008. This indicated
that the percentage of males who participated in the interviews decreased. On the other hand,
50.2% of females participated in the survey in 2001 and while 50.9% of female participated in
2008. This indicated that the percentage of females who participated in the survey in Hong Kong
increased in years. Of the total number of participants who participated in the survey in the entire
period, 10.4% had a monthly household income greater than 50000, 27.4% were in the income
bracket 25000-49999, 42.4% were in the income bracket 10000-24999, and lastly, 19.7% earned
less than 9999 in a month. The results indicated that the majority of the participants earned
10000-24999 in a month. Finally, most participants in the survey were in the age group between
35-44 years which had the largest percentage of 18.2% (Wong et al, 2013).
3.
A logistic regression model was fitted to the data with the diabetes status as the response variable
and age, sex and income as the independent variables. Inference test for association was carried
out based on the hypothesis that had been earlier formulated. The results of the test had a p-value
(P<0.05) this resulted in the rejection of the null hypothesis. This implied that the association
between diabetes status of a person and the explanatory variables was statistically significant at
0.05 level of significance. From this test, it was found out that there was the association between
the response and predictor variable varied and some predictor variable was not significant
depending on their p-values.
4.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The researchers adjusted the prevalence rates of diabetes for age and sex in order to determine
the variation that they could have on the income of the respondent. When it was adjusted the
prevalence progressively increased across the years in the two lowest income groups for both
gender respondents. Despite, that there was no definite trend for male and female.
5.
The logistic regression model that was fitted for the association between the diabetes status of
the respondent and the various explanatory variables was interpreted using the odds ratio.
Heuristically, when the variables are continuous the odds ratio indicated that odds of the
likelihood of influencing the response variable. On the contrary, when the predictor variables are
categorical the odds indicate the how much the odds of the likelihood of being associated with
the response variable, of one level of the categorical variable exceeds that of the level that has
been coded as the dummy variable. In this case the odds ratio indicated that females who were
aged more than 65 years had the highest risk self-reported diabetes diagnosis than those who
were aged 0-39 years, followed by males who were age more than 65 years old on the contrary
based on the sex predictor variable, male had the lowest risk of self-reported diabetes diagnosis
(Wong et al, 2013).
6.
The limitations that were highlighted by the researcher indicated that the results from the study
were not sufficient to make a conclusion about the sample used and also they could not fully
achieve the intended objective. The firs limitation and major one was the contradiction between
what was reported by other researchers in China and what was found out during the study. This
was due to reliance on self-reported diagnosis which was a poor method for correcting the data

since three in every four persons aren’t diagnosed (Wong et al, 2013). This meant that the sample
used could poorly represent the population. In conclusion, few variables were used in explaining
the prevalence of diabetes and other important variables were not used. This meant that the
results obtained from the sample were likely to suffer from variability.
References
Lam, T., Liang, W., Chikritzhs, T., & Allsop, S. (2014). Alcohol and other drug use at school
leavers' celebrations. Journal of Public Health, 36(3), 408-416.
Wong, M. C., S., Leung, M. C., M., Tsang, & Griffiths, S. M. (2013). The rising tide of diabetes
mellitus in a Chinese population: A population-based household survey on 121,895
persons. International Journal of Public Health, 58(2), 269-276.