Data Analysis Plan: Chicago Youth Health Study, Public Health Research

Verified

Added on 2023/01/03

AI Summary

This report outlines a comprehensive data analysis plan for a public health research project investigating the relationship between sexual orientation, body mass index (BMI), and methamphetamine use among Chicago youth. The plan begins with a presentation of the research questions and hypotheses, followed by a detailed description of the statistical methods to be employed, including linear and logistic regression, and Chi-square tests, using SPSS software. The report also addresses data cleaning and screening procedures, including handling missing data and outliers, and ensuring data normality. Furthermore, it meticulously examines threats to the validity of the research, encompassing internal, external, and construct validity, and describes how these threats will be mitigated. Finally, the report details the ethical procedures to be followed, including data acquisition agreements and the treatment of human participants, emphasizing the protection of privacy and adherence to ethical guidelines. The study aims to identify associations between BMI and methamphetamine use, and how sexual orientation moderates these relationships, providing valuable insights for public health interventions.

Data analysis plan
This paper proposes an analytical approach in the research involving examining the relationship
between sexual orientation, body mass index and methamphetamine use among Chicago youth.
First, the paper includes a statement of the original research questions and hypotheses. After
which, the paper is organized to include a plan on how the collected data will be analyzed. The
third and fourth section will include examination of the threats to the validity of this research and
the ethical procedures which will be observed during conducting data collection and dealing with
participants.
Research questions and Hypotheses
The following research question and hypotheses will be addressed in this study:
1. Is there any association between body mass index (independent variable) and
methamphetamine use (dependent variable) among Chicago youth?
Ho1: There is no statistically significant association between body mass index
(independent variable) and methamphetamine use (dependent variable) among
Chicago youth?
Ha1: There is a statistically significant association between body mass index (independent
variable) and methamphetamine use (dependent variable) among Chicago youth?
2. Does the association between body mass index (independent variable) and
methamphetamine use (dependent variable) differ by sexual orientation (moderating
variable)?
Ho2: The association between body mass index (independent variable) and
methamphetamine use (dependent variable) does not differ by sexual orientation
(moderating variable)?

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Ha2: The association between body mass index (independent variable) and
methamphetamine use (dependent variable) differs by sexual orientation (moderating
variable)?
3. Is there an association between methamphetamine use (independent variable) and body
mass index (dependent variable) among Chicago youth?
Ho3: There is no statistically significant association between methamphetamine use
(independent variable) and body mass index (dependent variable) among Chicago youth.
Ha3: There is a statistically significant association between methamphetamine use
(independent variable) and body mass index (dependent variable) among Chicago youth.
4. Does the association between methamphetamine use (independent variable) and body
mass index (dependent variable) differ by sexual orientation (moderating variable)?
Ho2: The association between methamphetamine use (independent variable) and body
mass index (dependent variable) does not differ by sexual orientation (moderating
variable)?
Ha2: The association between methamphetamine use (independent variable) and body mass index
(dependent variable) differs by sexual orientation (moderating variable)?
Software
For the analysis of the collected data, the study will use the Statistical Package for social
sciences (SPSS) version 24.0 which was released by IBM in 2016. The analyses will include
quantitative analyses given that this is a quantitative research project. For the quantitative
analysis, the paper will: involve use of some descriptive and inferential to examine the
distribution of the data, after which multiple logistic and linear regression analyses to examine

the influence of exogenous variables such as methamphetamine and body mass weight etcetera
with an aim to address the research objective.
Data cleaning and screening
This is the process of inspecting data for any potential errors as well as taking care of them
before any data analysis is done. It generally includes examining the research’s raw data to
determine if there are any outliers and eventually dealing with any missing data and the outliers
as well through reshaping and deleting the outliers. Data cleaning which is a subsequent process
after screening helps mitigate potential problems with the data that are identified during
screening that might end up affecting the statistical results during data analysis thus influencing
the inferences and conclusions drawn by the researchers regarding the research objective. As
such, the process of data cleaning in this project will consist of examining the:
Missing data
After collection of the data, there might be a number of missing observations which arise from
non-response by the respondents towards research questions and when recording the data into the
data files. Missing data, if any, will be taken care of through data imputation. Data imputation in
SPSS includes the replacement of missing observation by the series mean i.e. the mean of the
available observations is calculated and added to the missing observations rather than excluding
the rows of missing observations.
Reshaping the data
When entering the data observations, the researcher might enter values with different shapes i.e.
enter 705 where the observations take values in a likert scale. Examining the shape of the data
will enable us to determine if there are values which do not correspond to the distribution of the

dataset i.e. outliers. In SPSS, the outliers will be examined using box and whiskers which will
inform of the location on the outlier.
Normality
Normality tests are conducted prior to statistical analysis to determine if the data is normally
distributed, if not transformations such as log transformation will be used to normalize the
dataset (Loxton, 2008).
Statistical tests
After optimally cleaning the data, quantitative data analysis will be conducted which as
mentioned in the previous subsection will include multiple logistic, linear regression and
associated tests i.e. tests of significance and t tests for difference in means.
Linear regression
A linear regression essentially examines the relationship between one exogenous variables with
one response variable i.e. Y = α 0 + α1X1 + £i where, £i are the error terms of the model, α1 is the
regression coefficient, X1 are independent variables and Y is the dependent variable that is being
predicted.
Multiple logistic regression
Since the research also includes nominal variables such as sexual orientation, a multiple logistic
regression. It is thus efficient since it is adopted in cases involving one nominal variable and two
or more measurement variables, to examine how the effect of the measurement variables on the
nominal variable (McDonald, 2015).
Chi-Square test
Further, the use Chi-Square test which is relevant when the data sample has two nominal
variables, such that each nominal variable has only two or more values. It examines if the

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

proportions of some variable are different from those of the other variable’s values. It will be
used if the data collected is sufficiently large (Klein, Thompson, & Tangen, 2011).
Statistical tests in regression models
The p-values for the t-test and the F-test in the linear regression output will be used to test for the
significance of the independent variables in predicting the dependent variable and the
significance of the overall model respectively.
Interpretation of the statistical results
To reject the null hypothesis, the alpha level for type 1 error will be set at a confidence level of
95% for both the t-test, F-test, and Chi-square test where the null hypothesis is rejected in case p-
values of any of the three test statistics such as when testing significance of the model,
relationship between the dependent and independent variables etc. are greater than the alpha
level i.e. 0.05.
Threats to Validity
By validity we infer to the strength of the conclusions which will be drawn from the results.
Alternatively, how accurate are the research results? i.e. meaningfulness of research components.
That is, do they really measure what the research was set up to measure? There are a number of
different types of validity. In this paper however, only three are explored, i.e.: internal, external,
and construct (McLeod, 2013).
Internal validity
Internal validity refers to the extent that an evidence supports a given claim concerning the cause
and effect along this researches study, thus crucial in this social research. Moreover, internal
validity will form the basis for reasoning throughout this study and is determined through how

well the research will dismiss alternative explanations to the research study (Brewer, 2002). The
threats to internal validity might include:
Confounding
It is among the major threats in that, any change in the response variable might be attributed to
changes in a third variable that can have a relationship with the manipulated variable. Instances
when a spurious relationship is not entirely dismissed, some new and rival hypotheses to this
research’s original causal supposition can sprout.
Selection bias
This is the problem during research pre-test; in that there might exist differences between groups
which could interact with the independent variable hence lead to the observed outcome.
Selection bias in this research might occur due to the variability in the characteristics of both
researchers and the participants that will be involved in the study. The characters might be
inherent or learned such as age, weight, sex etcetera. And the learned characteristics such as
attitudes including their willingness to be involved. When conducting participant selection, in the
case where unequal participants show similar subject-related attributes then this might threaten
the internal validity of the research. The bias also might negatively affect the study’s interpretive
power of the dependent variable if it uses online data collection mechanisms where individuals
might be biased depending on demography.
Instrument change
Another threat towards internal validity is the change of instruments when conducting an
experiment. This might include change in data collection tools which might affect the response
from respondents that affecting the main conclusion.

Regression towards the mean
In case respondents are selected in reference to extreme scores when conducting a test i.e. far
away from the mean, it might lead to outliers in the data thus causing the error of regression
towards the mean.
External Validity
External validity will occur when the research conclusions are sought to be applied to other
fields apart from the context of this research. That is, this is the extent that the results of this
research can be generalized in different situations, people, stimuli, and times (Steckler &
McLeroy, 2008). It generally concerns itself with participant selection, the level and consistency
of implementing the results, and impact of outcomes (Glasgow, Klesges, Dzewaltowski, Bull, &
Estabrooks, 2008). Therefore, the threats to external validity include:
Non-representative sampling
Might arise if the research does not fully represent the persons/ participants it is designed to
represent. Thus might preclude a wrong generalization from the results of the inclusion the non-
representative sample.
Non-representative research context
In any case that the research is designed without a clear representative context i.e. the research
objective adopts a precluded generalization from one context to another then the error of non-
representative context might threaten the external validity of the research.
Construct validity
There are a number of research constructs that are applicable to this research. By construct
validity, the researcher implies to the degree with which the research tests measure what it claims
to measure. Threats to this validity are:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Inexactness in defining the constructs
A construct is categorized as poorly defined if any of the essential components are not on
entirety well-planned, if the levels of this research’s construct are undetermined, i.e. the research
does not provide exact or operational definitions for the constructs that are defined, or if there are
insufficient arguments given in explanation of either the research’s content or boundaries or
both, then this research will most likely face the err of inexactness in defining its constructs.
Mono-operation bias
In as much there are numerous ways through which any construct can be measured, it is not
entirely uncommon in quantitative research, that a research uses just a single measure for the
focus construct (Lund Research, 2012). Such, might lead to basing decisions/ conclusions of the
research on the outcomes lead by a single construct which might actually lead to biasness.
Mono-method bias
Additionally, the construct validity can also be threatened by the use of a single method in
measuring a given construct as much as using a single operation (Lund Research, 2012).
Ethical procedures
Agreement to gain data
The researcher applied for an access letter to the IRB for the approval to use secondary data
which is to be obtained from the Youth Risk Behavior Survey (YRBS), “…a national school-
based survey conducted by CDC as well as school-based state, territorial, tribal, and large urban
school district surveys conducted by education and health agencies,” (CDC, 2018). As such the
research obtains the dataset in agreement that it will be used specifically for this research and it
is to be referenced to center of disease control 2018.

Treatment of human participants
In the event of inclusion of other human participants in the research, the research will: ensure
there is no unintended leak of personal information if any, uphold respect for persons, protect
humans from harm i.e. minimize any potential harm while maximizing the potential benefits. In
addition, any benefit and burden of the research will be fairly distributed. Any other burden of
ethical concerns such as refusal of participants to participate in the original data collection, lies
with the primary data collectors and the consent letters can be obtained from the CDC (2018).
Treatment of data
The study adopts the use of secondary data thus it is relatively anonymous. In case there are any
variables that would point to the original participants, they will be removed from the dataset
unless they are part of the research’s constructs. The cleaned data that will be used for statistical
analyses and protected under a password to restrict access and will be released for review to the
supervisor. It will subsequently be available for 2 years after completion of the research after
which it will be destroyed.
Other ethical issues
This study was not in any way conducted in the researcher’s workplace, nor were there any
conflicts of interest or power differentials. Since the research uses secondary dataset, the
researcher will not use any incentives.

References
Brewer, M. (2002). Research Design and Issues of Validity: Handbook of Research Methods in
Social and Personality Psychology. (H. Reis, & C. Judd, Eds.) Cambridge: Cambridge
University Press.
CDC. (2018). Compare District and National Results. YRBSS Fact Sheets and Comparison of
State/District and National Results. YRBSS Results. Adolescent and School Health.
Centers for Disease Control and Prevention. Retrieved from
https://www.cdc.gov/healthyyouth/data/yrbs/results.htm
Glasgow, R. E., Klesges, L. M., Dzewaltowski, D. A., Bull, S. S., & Estabrooks, P. (2008). The
Importance of External Validity. Am J Public Health, 98(1), 9-10.
doi:10.2105/AJPH.2007.126847
Klein, E. A., Thompson, M. I., & Tangen, C. M. (2011). Vitamin E and the risk of prostate
cancer: the selenium and vitamin E cancer prevention trial (SELECT). Journal of the
American Medical Association, 306, 1549-1556.
Loxton, N. (2008). Data screening using SPSS. New York: IBM.
Lund Research. (2012, May 22). Threats to construct validity. Retrieved from Laerd Statistics:
http://dissertation.laerd.com/construct-validity-p2.php
McDonald, J. H. (2015). Multiple logistic regression. In J. H. McDonald, Handbook of
Biological Statistics (pp. 67-72). Baltimore: Sparky House .
McLeod, S. (2013, 11 19). What is Validity? Retrieved from Simply Psychology:
https://www.simplypsychology.org/validity.html