University Statistics Assignment: Chi-Square Tests and Data Analysis

Verified

Added on  2022/10/04

|8
|2130
|17
Homework Assignment
AI Summary
This homework assignment focuses on the application and interpretation of Chi-square tests across various scenarios. The assignment includes four main questions. Question 1 examines the relationship between gender and ice cream flavor preferences, requiring the identification of the appropriate Chi-square test, conducting the test in R, and interpreting the results. Question 2 investigates a research firm's claim about the distribution of food delivery orders across days of the week, involving hypothesis formulation, Chi-square test execution in Excel, and conclusion drawing. Question 3 explores the relationship between pet therapy participation and student calmness during exams, utilizing Excel for statistical significance calculation and percentage analysis, and constructing a PQ chart. Question 4 delves into the debate surrounding the pronunciation of the GIF file format, requiring table recreation in R, Chi-square statistic calculation in both Excel and R, and result interpretation. Finally, Question 5 analyzes the relationship between gender, age, and willingness to sacrifice job opportunities for family life using GSS 2016 data, involving Chi-square test results, warning message analysis, PQ table creation, and chart interpretation.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Question 1
1) Is there a relationship between gender and preference for ice cream flavor? The table below
summarizes the preferences of Male and Female on their preferred flavor of ice-cream
Gender Strawberry Vanilla Chocolate
Male 100 120 60
Female 350 200 90
1a) Which type of Chi-square test is appropriate for this analysis? (0.20 points)
Goodness of Fit
1b) Conduct the appropriate Chi-square test in R and report the X2, degrees of freedom and P-
value. You may choose the level of alpha. Be sure to report the values using APA citation.
(0.40 points)
Call:
lm(formula = Gender ~ Strawberry + Vanilla + Chocolate, data = Data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.400 NA NA NA
Strawberry -0.004 NA NA NA
Vanilla NA NA NA NA
Chocolate NA NA NA NA
Residual standard error: NA on 0 degrees of freedom
Multiple R-squared: NA, Adjusted R-squared: NA
F-statistic: NA on 1 and 0 DF, p-value: NA
1c) Based on the results above what would you conclude about ice cream preferences between
male and female? (0.40 points)
P>0.05 hence no ice cream preferences between male and female?
Question 2
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
2) A research firm claims that the distribution of the days of the week that people are most
likely to order food for delivery is different from the distribution seen in the past. You randomly
select 494 people and record which day of the week each is most likely to order food for
delivery. The table below also shows the results of your count. At alpha, α, = 0.05, test the
research firm’s claim. Note that history in the table below is expressed as a percentage.
History (%) Frequency (f)
Sunday 7 47
Monday 4 15
Tuesday 5 27
Wednesday 12 45
Thursday 11 44
Friday 37 166
Saturday 24 150
2a) Which type of Chi-square test is appropriate for this analysis? State the null and alternate
hypothesis with a word statement. (0.20 points)
Hint: the null can either directly state the population information; if not directly stating it, then it
should refer to the population distribution is some way.
Null: The distribution of the days of the week that people are most likely to order food for
delivery is NOT different from the distribution seen in the past
2b) Perform A chi-square test in Excel. Report the degrees of freedom, X2 critical value, your
calculated X2 statistic, and the corresponding p-value.
You may express the p-value as > or < alpha; if you report the exact p-value, round to no more
than 3 decimal places. You do not need to report these values using APA. (0.70 points)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.959827
R Square 0.921267
Adjusted R
Square 0.905521
Standard
Error 3.704317
Observation
s 7
ANOVA
df SS MS F
Significan
ce F
Regression 1 802.8188
802.818
8
58.5061
1 0.000608
Residual 5 68.60982
13.7219
6
Document Page
Total 6 871.4286
Coefficien
ts
Standard
Error t Stat P-value
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 0.900886 2.241075
0.40198
8 0.7043 -4.85998
6.66175
2
-
4.85998
6.66175
2
X Variable 1 0.189664 0.024796
7.64892
9
0.00060
8 0.125923
0.25340
4
0.12592
3
0.25340
4
Use the following command in Excel to generate your p-value but replace Chi-Square with the
Chi-Square Value you estimated and df with your degrees of freedom. You can enter the df
number directly into the formula or click on an Excel cell that contains the df value.
=CHISQ.DIST.RT(Chi-Square,df)
0.988
2c) Based on the results above what would you conclude about the research firms claim? (0.10
points)
The distribution of the days of the week that people are most likely to order food for delivery is
different from the distribution seen in the past
Question 3
A program of pet therapy has been running for students during the week before final exams.
Are the participants in the program calmer during finals than non-participants? The results
from a random sample of students are reported below:
Pet therapy participation
Calmness Participants Non-participants Totals
High level of calmness 23 15 38
Low level of calmness 11 18 29
Total 34 33 67
3a) Is there a statistically significant relationship between participation in pet therapy and
calmness during exams? Use Excel to calculate statistical significance. Use alpha = 0.05.
Report the X2 test results using APA citation. Round your Chi-Square statistic to no more than
two decimal places. (0.33 points)
Document Page
3b) Compute column percentages for the table to determine the pattern of the relationship.
Which group was more likely to be calm? Calculate this in Excel and create a table below to
show your percentages. This does not have to be a PQ table. Round your percentages to whole
numbers, no decimal places. (0.33 points)
Calmness Participants Non-participants Total
High level of calmness 61% 39% 100%
Low level of calmness 38% 62% 100%
Total 51% 49% 100%
3c) We are interested to see the difference between those with high levels of anxiety. Create a
PQ chart in Excel to show the pattern of the relationship using the percentages from the
question above. Place a copy of the PQ chart below. (0.34 points)
Participants Non-participants
0%
10%
20%
30%
40%
50%
60%
70%
61%
39%38%
62%
Chart Title
High level of calmness Low level of calmness
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question 4
There is an ever-raging debate in technical groups and across the Internet regarding the image
file format .gif. Some people pronounce it with a hard ‘g’ like golf or going. Others pronounce
it with a soft ‘g’ like giraffe or the name Geoffrey. A third, much smaller group pronounce it by
stressing all the letters individually, “JEE EYE EFF.” Below is a portion data taken from a survey
conducted on StackOverflow (a popular website for programmers and the tech-savvy) of more
than 50,000 people spanning 200 countries.
The values in the table below are percentages but treat them as observed frequencies for this
question. (I know the rows do not add up to 100. An ‘other’ response was not included here.)
Country Hard ‘g’ Soft ‘g’ All letters Total
USA 65 32 1 98
Europe 75 22 1 98
Asia 33 34 30 97
Total 173 88 32 293
4a) Demonstrate that you can recreate the table above in R. You must have the script in your
script file to complete the rest of the question but copy and paste it below for credit. (0.25
points)
Hint: There are several ways to do this. One likely approach was earlier in this homework.
Input = ("Country Hard Soft All letters Total
USA 65 32 1 98
Europe75 22 1 98
Asia 33 34 30 97
Total 173 88 32 293")
Data = read.table(textConnection(Input),header=TRUE)
Data = Data[order(Data$ Hard ‘g’),]
Data = Data[order(Data$ Soft ‘g’),]
Data = Data[order(Data$ All letters),]
library(FSA)
attach(Data)
headtail(Data)
4b) Calculate the Chi-Square statistic in both Excel and R. Determine statistical significance
using alpha = 0.01. Report the results using APA. (0.50 points)
Document Page
4c) What can you conclude from these results? Your answer should say something about the
statistical significance (or lack of) AND comment about how that finding relates back to region
and pronunciation of gif. (0.25 points)
There is a statistical significance; chisquare= 72.3, p<0.01 thus there is a relationship between
the region and pronunciation of gif.
Question 5
Society changes over time and our values of work and family change as well. Using the GSS
2016 data, we will explore the relationship, or lack thereof, between gender and age groups
and willingness to sacrifice good job opportunities for the benefit of family life.
Use R setup file to recode the age variable into age groups, where each age group spans five
years with high and low age group caps. Most of it should look familiar but see if you can
follow the process.
5a) Report the Chi-Square test results for men and women. Use APA citation to report. Are the
results for both gender statistically significant? Use alpha of 0.05. (0.20 points)
5b) The Chi-Square results for each test should have given you a warning:
Warning message:
In chisq.test(tab.job.men) : Chi-squared approximation may be incorrect
If you did not receive the warning, you likely did something wrong from above. Check with the
instructional team.
Why do you think you received this warning message? (0.20 points)
5c) One way to resolve the error is to either eliminate some of the age groups; another strategy
is to combine the age groups. We will do the second option. Instead of using 10 age groups,
Document Page
let’s recode them into 4 age groups. We will use quartiles to approximate 4 groups of relatively
equal size for each gender.
Start with a summary of the previously created age variable:
summary(gss$age.r)
You should see a Min of 18, 1st Qtr of 34, Median & Mean of 49 (rounded), 3rd Qtr of 62 and
Max of 89.
Modify the previous setup code that created 10 age groups so that it to creates 4 age groups
using the following quartile information:
Group 1: 18 to 34,
Group 2: 35 to 49,
Group 3: 50 to 62, and
Group 4: 63 and older
When you are done recoding, I suggest you run the next two lines to first check your recode
and see that it worked right and then see the four age group summary statistics.
table(gss$age.r, gss$age4) # confirm recode
Create a single PQ table that reports frequencies and percentages of your newly recoded four
age groups with separate columns for each gender.
Hint: if you are using the summary tools output—you only need the group categories, frequency,
% valid, and totals for each of those. Be sure to not count any missing in your frequency total.
(0.20 points)
5d) Lines 70 to 75 in the R setup file contains the code the bivariate tables and compute the
Chi-Square statistic with 10 age groups. Modify it to change the old 10 age group variable to
your new 4 age group variables.
Calculate and report the Chi-Square test results for each gender using APA citation. Use alpha =
0.05. (0.20 points)
5e) Use R setup code to reproduce the bivariate tables by gender and change the frequencies to
column percentages.
Look at the first response category of each table: “Yes, I have done so and probably would do so
again.” Create a single PQ bar chart using the proportions from that response for each gender
(convert them to percentages rounded to whole numbers). Use the age-groups on the x-axis,
include some version of the variable response category as the title, a legend to distinguish bars
for men and bars for women. Don’t worry about adding the sample size to this chart.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Then write two to four sentences about you can conclude from the Chi-Square results in 5d and
this PQ chart. One to two sentences to say if one/both tests were statistically significant or not
and one to two sentences to interpret the chart. (0.20 points)
chevron_up_icon
1 out of 8
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]