logo

Method and Assumption Checks

   

Added on  2022-09-14

12 Pages2427 Words14 Views
STATS 201/8 Assignment 2
Your name and ID here
Due Date: 3pm Thursday 29th October
Question 1
Question of interest/goal of the study
We want to compare both the mean and median cholesterol intake for male and female students.
Read in and inspect the data:
chol.df=read.table("chol.txt",header=TRUE)
boxplot(chol~sex,data=chol.df,main="Cholesterol Consumption by Gender",horizontal=TRUE)
summaryStats(chol~sex,data=chol.df)
## Sample Size Mean Median Std Dev Midspread
## F 695 190.3666 157.2 121.6626 138.450
## M 468 212.3870 184.4 120.2774 145.475
boxplot(log(chol)~sex,main="log(Cholesterol Consumption) by
Gender",horizontal=TRUE,data=chol.df)
Method and Assumption Checks_1
summaryStats(log(chol)~sex,data=chol.df)
## Sample Size Mean Median Std Dev Midspread
## F 695 5.062381 5.057519 0.6247157 0.8300062
## M 468 5.197945 5.217106 0.5877954 0.7767383
Comment on the plots/summary statistics
The box plots exhibit the dispersion of cholesterol levels for both male and female by using
various measures, such as the median, quartiles, and the ranges, whereby it is evident that the
male recorded a higher median value. Consequently, the dots (small circles) beyond the bars
exhibits the outliers (extreme values). Notably, the transmission of the cholesterol values to
logarithm reduced the number of outliers. Moreover, the summary statistics exhibit higher mean
(212.387) value for the male than female (190.366)
Fit model and check assumptions: Compare means
Male1<-subset(chol.df,chol.df$sex=="M",select ="chol")
Female1<-subset(chol.df,chol.df$sex=="F",select ="chol")
t.test(Male1,Female1)
Welch Two Sample t-test
data: Male1 and Female1
t = 3.0475, df = 1009.7, p-value = 0.002367
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.841447 36.199247
sample estimates:
mean of x mean of y
212.3870 190.3666
Method and Assumption Checks_2
As evident, the p-value (0.002367) is less than the significance level (0.05) thus there is
difference between the means for both gender (sex)
Fit model and check assumptions: Compare medians
Female<-c(rnorm(695,157.2,121.6626))
Male<-c(rnorm(468,184.4,120.2774))
t.test(Female, Male, paired = FALSE)
Welch Two Sample t-test
data: Female and Male
t = -4.4069, df = 1010.9, p-value =
1.161e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-46.09315 -17.69110
As evident, the p-value (1.161e-05) is less than the significance level thus there is difference
between the means for both gender (sex).
Method and Assumption Checks
The assumption for the model is that the cholera levels for both male and female assume a
normal distribution; however, the graph below doesn’t exhibit a normal curve.
Executive Summary
The dataset exhibits the cholesterol levels of both female and male students, whereby the general
mean of cholesterol is 199.2; however, the summary statistics exhibit higher mean value for the
male (212.387) than female (190.366). Moreover, there is difference between the means for both
gender (sex) since the p-value (0.002367) is less than the significance level (0.05).
Method and Assumption Checks_3
Question 2
Question of interest/goal of the study
We want to check the power relationship between the colour and their price per carat. In
particular, we want want to estimate how much 50% increase in colour score affects the price of
the diamonds.
Read in and inspect the data:
diamonds.df=read.csv("Diamonds.csv")
diamonds.df$logPrice=log(diamonds.df$Price)
diamonds.df$logColour=log(diamonds.df$Colour)
plot(logPrice~logColour,data=diamonds.df)
Comment on the data
The data has two variables, which exhibit the prices and color of diamond,
whereby the color of a diamond is linked to a particular price.
Fit model and check assumptions.
Diamond_model<- lm(diamonds.df$logPrice~diamonds.df$logColour)
summary(Diamond_model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.83181 0.13425 21.093 < 2e-16 ***
diamonds.df$logColour 0.56523 0.08109 6.971 4.18e-07 ***
Analysis of Variance Table
Df Sum Sq Mean Sq F value Pr(>F)
diamonds.df$logColour 1 1.60646 1.60646 48.591 4.184e-07 ***
Residuals 23 0.76039 0.03306
LogPrices = 2.83181 + 0.56523LogColour
Prices = 16.976 + 1.7599Colour
Method and Assumption Checks_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Does average self-reported weekly income differ between male and female full-time workers in Sydney?
|13
|2582
|174

401077 Introduction to Biostatistics | Assignment
|10
|2577
|178

Does average self-reported weekly income differ between male and female full-time workers in Sydney?
|8
|2283
|98

Introduction to Bio-Statistics
|8
|1676
|41

Psychological Report - Dataset
|5
|510
|16

Introduction to Biostatistics
|8
|1469
|98