Final Statistical Analysis Discussion - PHC6052 Course Project Step 4

Verified

Added on  2023/06/16

|8
|3648
|359
Homework Assignment
AI Summary
This assignment focuses on applying statistical inference to analyze relationships between variables using QQ, CC, and CQ cases. It includes descriptive summaries of explanatory and response variables, evaluation of Pearson’s correlation and linear regression, interpretation of the coefficient of determination and slope, and assessment of residual normality and constant variance. Additionally, it examines chi-square tests for categorical variables and discusses the distribution of binary responses within multi-level explanatory variables. The assignment provides detailed interpretations and conclusions based on statistical outputs, addressing the appropriateness of different statistical methods and the significance of observed relationships. Desklib is a great platform to find similar assignments and past papers.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Course Project Step 4 - Final Discussion Questions on Analyses Conducted
This is a preview of the published version of the quiz
Quiz Instructions
Use ONLY your final results for STEP 3 to answer the questions on this assignment.
BONUS for EARLY SUBMISSION = 1 point for each day prior to the deadline starting November
29th!! (Nov 29th = 5 points, 30th = 4 points, ... , Dec 3rd = 1 point)
Purpose: Apply statistical inference in the three cases covered in this course (QQ, CC, CQ) correctly
and also to see how we can approach the same question using different types of variables and
thus different methods.
We are picking apart the overall analyses into many sub-questions. Please carefully review all
questions related to each part from the STEP 3 analysis before beginning those questions and be
sure to address only the current question in each answer.
No remediation will be offered for this final assessment.
Note: Fill-in-the-blank questions will be marked incorrect in auto grading and must be hand graded
based upon your STEP 3 results.
Question 1
FROM STEP 3 PART 1: Descriptive Summary of EXPLANATORY variable
Provide a brief discussion of the distribution of your EXPLANATORY variable using as much of the
relevant information in STEP 3 Questions 1-4 as possible (and yet remain as concise as possible). Be
sure to include information from each question in STEP 3 Part 1.
The explanatory variable is also known as independent variable, in this study, the explanatory
variable is “Waist (Inches)”. The total number of variables for waist are 400 with mean 37.92 inches,
median 37 inches and standard deviation 5.72 inches. The shape of the distribution is slightly
skewed to the right as mean is greater than median. The histogram for waist indicates that the data
is slightly skewed at the right as the most of the frequencies for waist is obtained for 37 inches. The
boxplot indicates that the data for waist contains two outliers and the Q-Q plot indicates that data
for waist is light tailed.
Question 2
FROM STEP 3 PART 2: Descriptive Summary of RESPONSE variable
Provide a brief discussion of the distribution of your RESPONSE variable using as much of the
relevant information in STEP 3 Questions 5-8 as possible (and yet remain as concise as possible). Be
sure to include information from each question in STEP 3 Part 2.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question 6
Note: Answers to this question will be automatically marked incorrect. We will need to review them
manually to determine if they are correct based upon your STEP 3 output.
Answer this question regardless of any decisions regarding the conditions for using this method.
FROM STEP 3 PART 3: Case QQ - Using the two quantitative variables to investigate relationship
Using the exact values given by the software needed, write the regression equation for
the best line” through your data:
Y-hat = -49.579 + 5.99 (Waist)
Question 7
Note: Answers to this question will be automatically marked incorrect. We will need to review them
manually to determine if they are correct based upon your STEP 3 output.
Answer this question regardless of any decisions regarding the conditions for using this method.
FROM STEP 3 PART 3: Case QQ - Using the two quantitative variables to investigate relationship
Provide the exact values given by the software as needed to interpret the coefficient of
determination, R-square, in context:
Approximately 72.23 percent of the variation in weight can be explained by this simple linear
regression model using waist.
Question 8
Note: Answers to this question will be automatically marked incorrect. We will need to review them
manually to determine if they are correct based upon your STEP 3 output.
Answer this question regardless of any decisions regarding the conditions for using this method.
FROM STEP 3 PART 3: Case QQ - Using the two quantitative variables to investigate relationship
Provide the exact values given by the software as needed to interpret the slope of “the best line”
through your data in context.
Interpret this value about the “best line” regardless of whether this “best line” is a good model for
your data.
For each 1-unit increase in waist the weight is expected to increase by 5.99.
Question 9
FROM STEP 3 PART 3: Case QQ - Using the two quantitative variables to investigate relationship
Based upon the histogram and PP-plot of the residuals, explain whether or not you feel the
distribution of the residuals is reasonably normal.
Explain clearly what you see in the plots to justify your answer.
Document Page
Question 12
Question 12
FROM STEP 3 PART 4: Case CC – Using categorized versions of response and explanatory to
investigate relationship
Are the conditions for using the appropriate chi-square test satisfied for all combinations
investigated in Part 4 of STEP3?
Why or why not? Be sure to specify very clearly any combinations for which the conditions
are not satisfied.
The chi-square test will be used if expected frequency is greater than or equal to 5. According to
the provided outputs, the expected frequency for BINARY EXPLANATORY variable and BINARY
RESPONSE, MULTI-LEVEL EXPLANATORY variable and MULTI-LEVEL RESPONSE variable, BINARY
EXPLANATORY variable and MULTI-LEVEL RESPONSE variable and MULTI-LEVEL EXPLANATORY
variable and BINARY RESPONSE variable are greater than 5. Thus, the conditions for using chi-
square test is satisfied for all the combinations.
Question 13
Note: Answers to this question will be automatically marked incorrect. We will need to review
them manually to determine if they are correct based upon your STEP 3 output.
FROM STEP 3 PART 4: Case CC – Using categorized versions of response and explanatory to
investigate relationship
For each combination below provide the name of the appropriate chi-square statistic used,
it’s associated p-value, and fill in the appropriate information to complete the conclusion in
context for each test.
Answer this question completely regardless of your answer to the previous question
regarding the conditions for using these tests.
STEP 3 Q12: BINARY EXPLANATORY variable and BINARY RESPONSE variable
Name of Test Statistic: Chi-Square test.
P-Value: <0.0001
here is not enough evidence that an association exists between BINARY EXPLANATORY
variable and BINARY RESPONSE variable.
STEP 3 Q13: MULTI-LEVEL EXPLANATORY variable and MULTI-LEVEL RESPONSE variable
Name of Test Statistic: Chi-Square test.
P-Value: <0.0001
here is not enough evidence that an association exists between MULTI-LEVEL EXPLANATORY
variable and MULTI-LEVEL RESPONSE variable.
.
Document Page
STEP 3 Q14: BINARY EXPLANATORY variable and MULTI-LEVEL RESPONSE variable
Name of Test Statistic: Chi-Square test.
P-Value: <0.0001
here is not enough evidence that an association exists between BINARY EXPLANATORY variable
and MULTI-LEVEL RESPONSE variable.
STEP 3 Q15: MULTI-LEVEL EXPLANATORY variable and BINARY RESPONSE variable
Name of Test Statistic: Chi-Square test.
P-Value: <0.0001
here is not enough evidence that an association exists between MULTI-LEVEL EXPLANATORY
variable and BINARY RESPONSE variable.
Question 14
FROM STEP 3 PART 4: Case CC – Using categorized versions of response and explanatory to
investigate relationship
For the results produced in STEP 3 Question 15, involving the multi-level explanatory variable and
the binary response variable:
Provide a discussion of the distribution of your BINARY RESPONSE within the levels of the
MULTI-LEVEL EXPLANATORY variable.
Note: This is similar to the questions on the assignments which related to comparing conditional
percentages in Case CC.
The chi-square value for the test is 194.06, and the corresponding P-value is less than 0.001, so the
null hypothesis of test gets rejected and it can conclude that there is no association exists between
MULTI-LEVEL EXPLANATORY variable and BINARY RESPONSE variable.
Question 15
FROM STEP 3 PART 5: Case CQ – Using your quantitative response variable and the binary version
of your explanatory variable
Are the conditions for using the two-sample t-test for independent samples satisfied?
Why or why not?
Note: If you feel you need additional output to answer this question, you do not need to obtain it,
instead specify what you would obtain and explain what you would be looking for.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The conditions for two sample t-test is as below:
1. Independent variables are categorical (Waist).
2. Dependent variable is continuous (Weight).
3. The samples should be drowning randomly.
4. Homogeneity of variances (Variance approximately equal across groups).
5. No outliers. There are two outliers exist in dataset for waist size 40+ corresponding to the
weight. But all the other assumptions are satisfied and the sample size is very large so, it will not
affect the results and founding’s of the analysis.
Question 16
Note: Answers to this question will be automatically marked incorrect. We will need to review
them manually to determine if they are correct based upon your STEP 3 output.
Answer this question regardless of any decisions regarding the conditions for using this test.
FROM STEP 3 PART 5: Case CQ – Using your quantitative response variable and the binary
version of your explanatory variable
Provide the p-value for the test of equality of variances and fill in the appropriate
information to complete the conclusion of this test for equal variances in context.
P-Value for Test of Equality of Variances: 0.0003.
Conclusion of Test of Equality of Variances: There are sufficient enough evidence that the
variance of the response variable is different between the two waist groups, waist size less than
40 and waist size greater than 30.
Question 17
Note: Answers to this question will be automatically marked incorrect. We will need to review
them manually to determine if they are correct based upon your STEP 3 output.
Answer this question regardless of any decisions regarding the conditions for using this test.
FROM STEP 3 PART 5: Case CQ – Using your quantitative response variable and the binary
version of your explanatory variable
Based upon your answer to the previous question regarding the test for equality of variances,
choose the appropriate t-test (assuming equal variances or assuming unequal variances) and
for this t-test, provide the p-value as well as the confidence interval for the difference between
means
o Appropriate TTEST P-Value: < 0.0001
o Confidence interval for difference in population means: (-68.7836 ,-53.5074)
Document Page
Question 18
Answer this question regardless of any decisions regarding the conditions for using this test.
FROM STEP 3 PART 5: Case CQ – Using your quantitative response variable and the binary
version of your explanatory variable
Using the results from the previous question:
Provide one sentence stating the conclusion in context for the two-sample t-test.
Provide one sentence with the interpretation of the confidence interval for the difference
between the two population means. If the t-test is statistically significant, be sure your
interpretation includes which group mean is larger or smaller than the other and by how much.
According to the 2 sample t-test output the t-value is -16.77, p-value is 0.0001, now compare the
P-value with the level of significance (Say 0.05), the P-value (0.0001) is less than level of
significance 0.05. Hence, null hypothesis gets rejected and it can be concluded that there is
statistically significant difference in the mean of waist size less than 40 and waist size greater
than 30 corresponding to the weight. The mean of waist size greater than 40 is larger than the
mean of the waist size less than 40 by 60.7 inches.
Question 19
Consider the conclusion provided to the test in the previous question.
Which type of error could you have made with your conclusion to the two sample t-test?
What would this error imply in the context of your data?
Your answer should be 1-3 sentences.
Reject the null hypothesis when it is true will make type I error, if we reject the null hypothesis
that there is no difference between the means of waist size greater than 40 and the mean of the
waist size less than 40 when it is true then type I error will happen.
Accept the null hypothesis when it is false will make type II error, if we does not reject the null
hypothesis that there is no difference between the means of waist size greater than 40 and the
mean of the waist size less than 40 when it is false then type II error will happen.
Question 20
FROM STEP 3 PART 6: Case CQ – Using your quantitative response variable and the multi-level
version of your explanatory
Are the conditions for using the one-way ANOVA satisfied?
Document Page
Why or why not?
If you feel you need additional output to answer this question, you do not need to obtain it,
instead specify what you would obtain and explain what you would be looking for.
Length may vary; keep your answer as brief as possible while addressing the question.
1. Dependent variable (weight) should be measured at the continuous level.
2. Independent variables (Multilevel waist) should consist of two or more categorical,
independent groups.
3. The observations should be independent which means that there is no relationship between
the observations in each group or between the groups.
4. There should be no significant outliers. The Boxplot for the multilevel independent variable
corresponding to the response variable weight, not indicates strong outliers.
5. Dependent variable should be approximately normally distributed for each combination of the
groups of the independent variables.
6. There needs to be homogeneity of variances for each combination of the groups of the two
independent variables. So the Levene’s test for homogeneity of variances will useful.
Question 21
Note: Answers to this question will be automatically marked incorrect. We will need to review
them manually to determine if they are correct based upon your STEP 3 output.
Answer this question regardless of any decisions regarding the conditions for using this test.
FROM STEP 3 PART 6: Case CQ – Using your quantitative response variable and the multi-level
version of your explanatory
Provide the p-value of the one-way ANOVA test and fill in the appropriate information to
complete the conclusion in context.
P-Value for One-Way ANOVA: <0.0001
Conclusion: There is significant enough evidence that mean weight corresponding to the four
multilevel explanatory variables are different.
Question 22.
Consider the conclusion provided to the test in the previous question.
Which type of error could you have made with your conclusion to this ANOVA test?
Explain what this error would imply in the context of your data. Your answer should be 1-3
sentences.
Accept the null hypothesis when it is false will make type II error, if we does not reject the null
hypothesis that there is no difference between the means of four multilevel when it is false then
type II error will happen.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Question 23.
OVERALL SUMMARY
Explain which appropriate and valid method you prefer as your primary method to analyze
this data and why.
Length may vary; keep your answer as brief as possible while addressing the question.
Both the variable weight and waist are categorical, so the chi-square test will be used for the
analysis. This test will indicate the relationship between the binary and multilevel variables
corresponding to the weight and waist.
The multilevel binary logistic regression or binary logistic regression can be used for forecasting
of weight.
Question 24
Consider the tasks involved in the entire course project (STEPS 1-4).
At the beginning of the semester what percent (0 to 100) of these concepts and ideas were you
already familiar with?
You may also provide any explanations or other comments you wish to share.
Please do by yourself.
chevron_up_icon
1 out of 8
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]