Regression Analysis and Correlation Coefficient

Verified

Added on 2019/09/26

AI Summary

The assignment is about analyzing the relationship between parents' knowledge on children's nutritional requirements and healthy eating before and after an intervention, as well as predicting the parents' knowledge after the intervention based on their knowledge before the intervention. The Pearson's correlation coefficient (r) is -0.1332, indicating no strong variation between the two variables. Additionally, the R-squared value (R2) is 0.4597, suggesting that about 46% of the variation in parents' knowledge after the intervention can be explained by their knowledge before the intervention.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Q U E S T I O N 1
1. The researcher is interested to know if energy intake (ENERGY) and vitamin C intake
(VITC) have a Normal distribution. Use the following Table as a guide.
Measures Criteria/Cut-off points for supporting normality
Histogram Symmetrical, bell-shaped curve
Boxplot Median in the centre of the box with whiskers at equal
length at both ends of the box and no outliers
Normal Q-Q plot Most observations appear on the straight line
Skewness and
kurtosis
coefficient
STATA users: Skewness and (kurtosis-3) are between -
1 and 1;
(SPSS users: Skewness and kurtosis are between -1
and 1)
2.
3. Which of the following would be appropriate?
a
.
ENERGY and VITC both have a Normal distribution, and hence a natural logarithm
transformation is not necessary for both ENERGY and VITC
b
. ENERGY and VITC both do not have a Normal distribution, and hence a natural
logarithm transformation is necessary for both ENERGY and VITC
c
. ENERGY has a Normal distribution and VITC has a right (positively) skewed distribution, and
hence a natural logarithm transformation is necessary only for VITC
d
. ENERGY has a Normal distribution and VITC has a left (negatively) skewed distribution, and
hence a natural logarithm transformation is necessary only for ENERGY
e
. None of the above
1 points
Q U E S T I O N 2
1. Based on previous question, you now understand whether the
variables ENERGY and VITC are normally distributed. What should be the most
appropriate measures of centrality and variability to report for
variable ENERGY and VITC? (Hint: different measures of centrality and variability need to be
reported for data that have a Normal or a skewed distribution).
a
. Mean and standard deviation for ENERGY. The reason is that variable ENERGY has a
normal (symmetric) distribution
b
. Median and interquartile range for VITC. The reason is that variable VITC does not have a
normal distribution but a skewed distribution
c
. Median and interquartile range for ENERGY. The reason is that variable ENERGY has a
normal (symmetric) distribution
d
. Mean and standard deviation for VITC. The reason is that variable VITC does not have a
normal distribution and it has a skewed distribution
e
. Answer (a) and (b) are correct
f. Answer (c) and (d) are correct
1 points
Q U E S T I O N 3
1. Obtain summary statistics for variable ENERGY. Which of the following is NOT
CORRECT?
a
. The sample mean energy intake of these children is 4764.879 kJ

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

b
. There were 50% of the children, whose energy intake is higher than 4804.95 kJ in this sample
c
. There was no any child in this sample whose energy intake lower than 2816.8 kJ
d
. The sample standard deviation of energy intake is 850.676 kJ, then we can conclude
that 99% of the children in this sample whose energy intake ranged from
3063.528 (i.e., mean-2*SD) to 6466.232 (i.e., mean+2*SD) kJ
1 points
Q U E S T I O N 4
1. Obtain 95% confidence interval for variable ENERGY. Which of the following statement is correct
about the estimation of the mean energy intake in the population of the 2 to 3 year old children in
WA?
a
. According to the sample information, the average daily energy intake in the population of the
2 to 3 year old children in WA was estimated to be between 4555.757 and 4974.001 kJ
b
. Based on the sample information, we are 95% confident that the sample mean daily energy
intake of 2 to 3 year old children was between 4555.757 and 4974.001 kJ
c
. Based on the sample information, the mean daily energy intake in the population of the 2 to
3 year old children in WA was estimated with 95% certainty to be between 4555.757 and
4974.001 kJ
d
. Based on the sample information, there are 95% of the 2 to 3 year old children in WA
population having a daily energy intake between 4555.757 and 4974.001 kJ
e
. None of the above is correct
1 points
Q U E S T I O N 5
1. Which of the following statement related to confidence interval is correct?
a
. If the sample size of this study increased from 66 to 660, we will expect the 95% CI to become
wider as there is a larger variation now with a larger sample size
b
. If the sample size of this study increased from 66 to 660, we will expect the 95% CI to become
narrower and be more precise than when the sample size was 66
c
. The higher the confidence levels (e.g. from 90% to 95% to 99%), the more confident we are
about capturing the actual population parameter and therefore the corresponding lengths of
the CIs tend to be shorter
d
. The higher the confidence levels (e.g. from 90% to 95% to 99%), the more confident we are
about capturing the actual population parameter and therefore the corresponding CIs tend to
be wider
e
. Answers (b) and (d) are both correct
1 points
Q U E S T I O N 6
1. The dietician now wants to investigate whether there is an association in energy intake
between the children who live in the country and those that live in the city. You need to
first recode the variable ENERGY into a categorical variable ENERGYCataccording to the
following table (Hint: Give the recoded variable a new name and remember to assign value
labels to the new recoded variable).
Values of the original variable ENERGY to be recoded
into following levels
Values of the new recoded
variable
ENERGYCat
Less than or equal to 4500 kJ (<= 4500) 1

Greater than 4500 kJ & less than or equal to 5000 kJ
(>4500 & <=5000)
2
Greater than 5000 kJ (>5000) 3
Which of the following would be appropriate to describe the frequency distribution of this
categorical variable ENERGYCat?
a
. Frequency and percentage, and the percentage of kids having energy intake greater than
4500 kJ and less than or equal to 5000 kJ (>4500 & <=5000) is 30.30% (n=20)
b
. Mean (=2.06) and standard deviation (=0.839)
c
. Median (=2) and interquartile range (=2)
d
. Skewness (=-0.114) and kurtosis (=1.451)
e
. Minimum (=1) and maximum (=3)
1 points
Q U E S T I O N 7
1. Based on the new variable you recoded in Question 6, obtain a cross-tabulation
of ENERGYCat and LOCATION. Which of the following statements is appropriate to describe the
levels of energy intake between the children who live in city and country?
a
. More than half of the boys (53.85% of them) lived in city while slightly more of the girls
(55.56% of them) lived in city too
b
. Of the children who lived in country, more of them tend to have daily energy intake equal or
less than 4500 kJ (40.00%), compared to those who lived in city (25.00%)
c
. Half (50%) of the country children had daily energy intake between 4500 to 5000 kJ, and the
percentage is the same for city children
d
. Children who lived in city were more likely to have total daily energy intake >5000 kJ than
those kids who lived in country (City 47.22% vs Country 26.67%)
e
. Both (b) and (d) are correct
1 points
Q U E S T I O N 8
1. Assuming all relevant assumptions are met, how would you test whether there is any
association between the levels of energy intake of children ENERGYCat and the location
they lived LOCATION?
a
. Pearson Correlation Coefficient can be used as ENERGY and LOCATION both are
continuous
b
. Chi-square test can be used as ENERGYCat and LOCATION both are categorical
c
. An independent (two samples) samples t-test can be used as ENERGY is continuous,
and LOCATION is categorical having two levels
d
. One-way ANOVA is suitable for this research question as LOCATION is continuous and
ENERGYCat is categorical with 3 levels
e
. Paired samples t-test is fine to answer this research question, as LOCATION and
ENERGYCat are repeated variables
1 points
Q U E S T I O N 9

1. According to the test you chose in Question 8, How can you conclude about the association
between levels of energy intake and the location of the children lived (use α =0.05)?
a
. The Pearson Correlation Coefficient is 0.2126 with a p value of 0.0865
>0.05. Assuming the assumptions are met, it can be concluded that there is no significant
linear relationship between levels of energy intake and location of the children lived
b
. The chi-square statistic is 3.1491 with a p-value 0.207 >0.05. Assuming the assumptions are
met, it can be concluded that there is no significant association between levels of energy
intake and location of the children lived in the population
c
. The p-value from the independent (two samples) samples t-test is <0.001.
Assuming the assumptions are met, it can be concluded that there is a significant difference
in the population mean energy intake between children lived in city and country
d
. The p-value from the one-way ANOVA is 0.2144 >0.05. Assuming the assumptions are
met, it can be concluded that there is no significant difference in the population mean
location among the levels of energy intake
e
. The p-value from the Paired samples t-test is <0.001. Assuming the assumptions are
met, it can be concluded that there is a significant difference in the population mean energy
intake between children lived in city and country
1 points
Q U E S T I O N 1 0
1. The dietician wants to test a research hypothesis that the mean energy intake in this population of
the 2-3 years old children μis equal to 4500 kJ. The correct hypotheses statement(s) for this research
objective would be _________. (Hint: this question uses the original continuous
variable ENERGY).
a
. Ho: μ = 4500 kJ; Ha: μ ≠ 4500 kJ
b
. Ho: μ ≠ 4500 kJ; Ha: μ = 4500 kJ
c
. Null hypothesis: the sample mean energy intake of the 2-3 years old children is 4500 kJ;
Alternate hypothesis: the sample mean energy intake of the 2-3 years old children is not 4500
kJ
d
. Null hypothesis: the mean energy intake in this population of the 2-3 years old children is
equal to 4500 kJ; Alternate hypothesis: the mean energy intake in this population of the 2-3
years old children is not equal to 4500 kJ
e
. Statements (a) and (d) are both correct
1 points
Q U E S T I O N 1 1
1. Assume 5% level of significance (α =0.05). The appropriate statistical test to test that hypothesis
in Question 10 would be __________ ; and the results are found to be ________
a
. One sample t-test; t-value = 2.5296, p-value = 0.0139
b
. Two samples (independent samples) t-test; t-value = 3.84, p-value = <0.001
c
. Paired-sample t-test; t-value = 0.64, p-value = 0.527
d
. One-way ANOVA; t-value = -5.57, p-value = <0.001
e
. Pearson correlation coefficient; r = 0.08, p-value = 0.520
1 points

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Q U E S T I O N 1 2
1. An appropriate conclusion about the research hypothesis (in Question 10) would therefore
be ________
a
. The population mean energy intake of the 2-3 years old children is equal to the test value,
4500 kJ, as the p-value is close to zero, means no difference
b
. Because the sample mean energy intake is 4764.879kJ, which is higher than the hypothesized
4500 kJ, and therefore the alternative hypothesis has to be rejected
c
. The p-value the one sample t-test is not much different from 0.05. In addition, the 95%
confidence interval of the population mean does not include the hypothetical value ‘4500’,
therefore supporting the decision to accept the null hypothesis and conclude that the
population mean energy intake of the 2-3 years old children is equal to 4500 kJ
d
. The p values of the one sample t-test 0.0139 is less than 0.05, in addition, the estimated 95%
confidence interval of the population mean energy intake (4555.757, 4974.001) does not
include the hypothetical value '4500', hence the null hypothesis has to be rejected at 5%
significance level. We conclude that the population mean energy intake of the 2-3 years old
children is significantly different to 4500 kJ
1 points
Q U E S T I O N 1 3
1. The dietician now wishes to test the hypothesis that the population mean energy
intake μ is the same for the boys and girls. You will test the hypothesis by following the
steps of hypothesis testing. The appropriate null and alternative hypothesis will
be_____________. (Hint: this question uses the original continuous variable ENERGY).
a
. H0: the population mean energy intake is the same for the boys and girls, HA: the population
mean energy intake is different for the boys and girls
b
. H0: the energy intake is the same for the boys and girls, HA: the energy intake is different for
the boys and girls
c
. H0: μboys ≠ μgirls, HA: μboys = μgirls
d
. H0: μboys - μgirls = 0, HA: μboys - μgirls ≠ 0
e
. Both a) and d)
1 points
Q U E S T I O N 1 4
1. Assume 5% level of significance (α =0.05), state which statistical test you plan to use to test the
hypotheses stated above inQuestion 13 .
a
. One sample t-test
b
. Paired samples t-test
c
. Independent samples (two-samples) t-
test
d
. One-way ANOVA
e
. Chi-square test
1 points
Q U E S T I O N 1 5
1. Which of the followings are assumptions for the statistical test you nominated in Question 14?

a
. Random sampling and independent observations
b
. The cells are mutually exclusive and exhaustive, and no more than 20% of the expected
frequencies are less than 5
c
. Normality of the variable of interest (energy intake: ENERGY), and equal
variances between the 2 gender groups
d
. Both (a) and (b)
e
. Both (a) and (c)
1 points
Q U E S T I O N 1 6
1. After evaluated the relevant assumptions associated with the test you nominated
in Question 15. Which of the following statements is correct? (Hint: you have assessed the
normality of ENERGY in Question 1).
a
. Normality can be assumed for ENERGY and the assumption of equal variances is met too,
hence I will use the originalENERGY to do the t-test with equal variances
b
. Normality can be assumed for ENERGY. However the assumption of equal variances is not
met and I will use the original ENERGY to do the t-test with unequal variances
c
. Normality cannot be assumed for ENERGY and a natural logarithm transformation is
applied to ENERGY. The assumption of equal variances is not met and I will use the
original ENERGY directly to do the t-test with unequal variances
d
. Normality cannot be assumed for ENERGY and a natural logarithm transformation is
applied to ENERGY. The assumption of equal variances is met after transformation and I
will use the transformed ENERGY to do the t-test with equal variances
e
. None of the above as the test does not need to assess the normality and equal variances
1 points
Q U E S T I O N 1 7
1. After you run the statistical test for the research hypotheses stated in Question 16, what can you
conclude about it?
a
. The test statistics is 2.2638, the p-value is 0.0270, the 95% CI of the difference is (54.939,
880.082) and does not include ‘0’, suggesting that we have to reject the null hypothesis and
conclude that the population mean energy intake is significantly different between the boys
and the girls
b
. The test statistics is 45.4916, the p-value is <0.001, the 95% CI of the difference is (4556.312,
4970.628) and does not include ‘0’, suggesting that we have to reject the null hypothesis and
conclude that the population mean energy intake is significantly different between the boys
and the girls
c
. The test statistics is 2.1832, the p-value is 0.0339, the 95% CI of the difference is (37.092
897.929) and does not include ‘0’, suggesting that we have to reject the null hypothesis and
conclude that the population mean energy intake is significantly different between the boys
and the girls
d
. The test statistics is 2.3804, the p-value is 0.0203, the 95% CI of the difference is (0.018,
0.201) and does not include ‘0’, suggesting that we have to accept the null hypothesis and
conclude that the population mean energy intake is the same between the boys and the girls
e
. None of the above is correct

1 points
Q U E S T I O N 1 8
1. The dietician then wants to know whether the population mean parents’ knowledge on
children’s nutritional requirements and healthy eating after the intervention are the same as
their knowledge before the intervention. The appropriate null and alternative hypothesis will be,
assuming μ is the population mean parents’ knowledge.
a
. H0: μafter = μbefore, HA: μafter ≠ μbefore
b
. H0: μbefore = μafter, HA: μbefore ≠ μafter
c
. H0: μbefore - μafter = 0, HA: μbefore - μafter ≠ 0
d
. H0: μafter - μbefore = 0, HA: μafter - μbefore ≠ 0
e
. All the above are correct
1 points
Q U E S T I O N 1 9
1. To test the hypothesis stated in Question 18 (α= 0.05), the appropriate hypothesis test would
be____________.
a
. One sample t-test
b
. Paired samples t-test
c
. Independent samples (two-samples) t-
test
d
. One-way ANOVA
e
. Chi-square test
1 points
Q U E S T I O N 2 0
1. Assuming the assumptions for the statistical test you chose to do above are met, what can you
conclude about the research hypothesis after you run the statistical test (Assume 5% level of
significance: α =0.05)?
a
. It is found that the Pearson’s r value is 0.678, the p-value is <0.0001, we hence reject the null
hypothesis and conclude that suggesting that the parents’ knowledge on children’s
nutritional requirements and healthy eating after the intervention is 0.678 unit higher than
the parents’ knowledge before the intervention
b
. The mean difference between the parents’ knowledge on children’s nutritional requirements
and healthy eating before and after the intervention is -1.806 units. The t-value is -
2.8323, p-value is 0.0061, 95% CI of the difference is (-3.079, -0.532) units
and does not include ‘0’, suggesting that there is a significant difference in
the population mean parents’ knowledge on children’s nutritional requirements and
healthy eating between before and after the intervention, with the parents’ knowledge
before the intervention lower by 1.806 units on average in the population
c
. The mean difference between the parents’ knowledge on children’s nutritional requirements
and healthy eating after and before the intervention is 1.806 units. The t-value is
2.8323, p-value is 0.0061, 95% CI of the difference is (0.532, 3.079) units
and does not include ‘0’, suggesting that there is a significant difference in
the population mean parents’ knowledge on children’s nutritional requirements and
healthy eating between after and before the intervention, with the parents’ knowledge after

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

the intervention higher by 1.806 units on average in the population
d
. Only statement (a) is incorrect
1 points
Q U E S T I O N 2 1
1. Furthermore the dietician wishes to test whether the parents’ knowledge on children’s nutritional
requirements and healthy eating before the intervention KNOW1 varies significantly across the 3
energy levels of their children consumed at the population level. Using the categorical
variable ENERGYCat (Hint: you have obtained this variable in Question 6) and assuming
5% level of significance, the appropriate test to use would be?
a
. Two samples (independent samples) t-test
b
. Paired-sample t-test
c
. One-way ANOVA
d
. Chi-square test
e
. Pearson’s correlation
1 points
Q U E S T I O N 2 2
1. What can you conclude about the research hypothesis in Question 21?
a
. The t-statistic is 93.0784, p-value is <0.001, the 95% CI of the difference is (56.400, 58.850) kJ
and does not include ‘0’, suggesting that in this population, there is a significant difference
between the parents’ knowledge on children’s nutritional requirements and healthy eating
before the intervention and children’s daily energy intake in the population
b
. The t-statistic is 91.1044, p-value is <0.001, the 95% CI is (56.362, 58.889) kJ and does not
include ‘0’, suggesting that in this population, there is a significant difference between the
parents’ knowledge on children’s nutritional requirements and healthy eating before the
intervention and children’s daily energy intake in the population
c
. The F test-statistic is 0.58, p-value is 0.5602, suggesting that there are no significant
difference in the population mean parents’ knowledge on children’s nutritional requirements
and healthy eating before the intervention across the groups of children who had different
levels of energy intake
d
. The χ2 statistic is 60.8740, p-value is 0.030, suggesting that there is a significant difference
between the parents’ knowledge on children’s nutritional requirements and healthy eating
before the intervention and children’s daily energy intake in the population
e
. The Pearson’s correlation coefficient r is -0.1332, p-value is 0.2865, suggesting that there is
no strong variation between the parents’ knowledge on children’s nutritional requirements
and healthy eating before the intervention and children’s daily energy intake in the
population
1 points
Q U E S T I O N 2 3
1. Lastly, the dietician wants to know whether, at population level, the parents’ knowledge
on children’s nutritional requirements and healthy eating after the intervention can be
predicted by the parents’ knowledge before the intervention. You can assume that the
assumptions for the statistical approach you chose to partake are met. Which of the
following is the appropriate statistical analysis?

a
. Scatter plot using KNOW1 as Y variable and KNOW2 as X variable; Paired samples t-test;
and simple linear regression using KNOW1 as a dependent variable and KNOW2 as an
independent variable
b
. Scatter plot using KNOW2 as Y variable and KNOW1 as X variable; Paired samples t-test;
and simple linear regression using KNOW1 as a dependent variable and KNOW2 as an
independent variable
c
. Scatter plot using KNOW2 as Y variable and KNOW1 as X variable; Pearson’s correlation
coefficient; and simple linear regression using KNOW2 as a dependent variable
and KNOW1 as an independent variable
d
. Scatter plot using KNOW2 as Y variable and KNOW1 as X variable; Chi-square test; and
simple linear regression usingKNOW2 as a dependent variable and KNOW1 as an
independent variable
e
. Scatter plot using KNOW2 as Y variable and KNOW1 as X variable; Chi-square test; and
multiple linear regression usingKNOW2 as a dependent variable and KNOW1 as an
independent variable
1 points
Q U E S T I O N 2 4
1. Based on the analyses you conducted, which of the following are the correct conclusion?
a
. The Pearson’s correlation coefficient (0.6780) indicates that there is a strong positive linear
and also significant (p<0.001) relationship between the parents’ knowledge on children’s
nutritional requirements and healthy eating before and after the intervention, suggesting
those parents with higher knowledge on children’s nutritional requirements and healthy
eating before the intervention tend to have improved knowledge after the intervention
b
. The Pearson’s correlation coefficient (0.6780) indicates that there is a strong negative linear
and also significant (p<0.001) relationship between the parents’ knowledge on children’s
nutritional requirements and healthy eating before and after the intervention, suggesting
higher knowledge values after the intervention are associated with lower knowledge before
the intervention
c
. For each one unit of score increased in the parents’ knowledge on children’s nutritional
requirements and healthy eating before the intervention, the parents’ knowledge after the
intervention is increased by 0.963 units on average, with 95% certainty the parents’
knowledge is increased by between 0.702 and 1.223 units on average in the population after
the intervention
d
. For each one unit of score increased in the parents’ knowledge on children’s nutritional
requirements and healthy eating after the intervention, the parents’ knowledge before the
intervention is increased by 0.477 units on average, with 95% certainty the population mean
parents’ knowledge is increased by between 0.348 and 0.607 units before the intervention
e
. Answers (a) and (c) are correct
f. Answers (b) and (d) are correct
1 points
Q U E S T I O N 2 5
1. Based on the analyses you conducted, which of the following statements are the correct
conclusion?
a
. The R2 value is called coefficient of correlation
b
. The R2 value is called coefficient of determination
c As the R2 value is 0.4597, indicating KNOW1 only explains approximately 46% of variation

. in KNOW2
d
. As the R2 value is 0.4597, there were 46% of parents, whose knowledge
(i.e., KNOW1) improved after the intervention in KNOW2
e
. Answer (a) and (d) are correct
f. Answer (b) and (c) are correct
1 points
Q U E S T I O N 2 6
1. Please make sure you have answered all the above 25 questions in this quiz!
1) Before you hit the 'Save and Submit' button, you need to provide outputs
you generated from STATA as evidence to support your given answers. The
outputs for Questions 1, 3, 4, 6, 7, 9, 11, 17, 20, 22, and 24 should be provided
on a word document and uploaded here by clicking Browse My Computer tab.
Marks will be deducted from your total score (0.5 mark for each missing
output) if you do not provide the outputs.
2) After you hit the 'Save and Submit' button, you need to return back to
the Assessment page and submit the same word document to Turnitin via the
"Turnitin_Quiz 1 (Outputs)" link. Your quiz will NOT be marked until it has been
submitted to Turnitin