This report presents an analysis of the cars dataset using Rstudio. It includes frequency tables, bivariate and multiple linear regression analysis with interpretation and comments on the results. The report also includes hypotheses testing, R-squared values, and standard errors.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Task 1 Online data on cars was used for this report. The link to the dataset is given below; https://perso.telecom-paristech.fr/eagan/class/igr204/data/cars.csv 1.Create two frequency tables. Interpret and comment on the result. Answer Frequency table for the Cylinders As can be seen, most of the vehicles had 4 cylinders (n = 207) while very few had 5 cylinders (n = 3). Figure1: Bar chart of cylinders Frquency table for the Origin Most of the cars in the sample came from US (n = 254) while the second highest werre from Japan (n = 79) and the rest (n = 73) came from Europe(Trochim, 2006). >counts1 Cylinders 34568 4 207384 108 >counts2 Origin EuropeJapanUS 7379254
2.Create a cross table where you use two categorical variables. Formulate hypotheses to the table and calculate the Chi-squared and Cramers V for the table. Interpret and comment on the result. Answer The two selected variables are; Origin and Cylinders The following hypothesis was tested; H0: There is no significant association between Cylinder and country of origin. HA: There is no significant association between Cylinder and country of origin. Results of the Chi-Square test From the table, it can be seen that the p-value 0.000 is less than the .05 significance level, we therfore do reject the null hypothesis and conclude that that there is strong evidence of significant association between Cylinder and country of origin. Cramer’s V For the Cramer’s V we have the results given in the table below. >mytable # print table Origin Cylinders Europe Japan US 3040 4666972 5300 64674 800 108 >chisq.test(mytable) Pearson's Chi-squared test data:mytable X-squared = 186.6048, df = 8, p- value < 2.2e-16
Symmetric Measures ValueApprox. Sig. Nominal by NominalPhi.678.000 Cramer's V.479.000 N of Valid Cases406 The value of Cramer’s V was found to be 0.479 which shows a moderate association between the variables (Origin and Cylinders). Task 2 In this task you will work with continuous and categorical variables. You are free to choose data sets and variables yourself, either from data sets available in Canvas or from your own source. Siter the data set you use in the task. 1.Select at least two continuous variables. Find average, standard deviation, minimum and maximum valuesfor the variables. Interpreters and comments result. Answer The average Miles per gallon (MPG) was found to be 23.05 with the minimum and maximum being 0.00 and 46.60 respectively while the median value was found to be 22.35. The standard deviation was 8.40. This shows that the data is almost close to normal distribution. On the other hand, the average displacement was found to be 194.8 with the minimum and maximum being 68.0 and 455.0 respectively while the median value was found to >summary(data$MPG) Min. Stdev.Median MeanMax. 0.008.4022.35 23.0546.60 > summary(data$Displace ment) Min. Stdev.Median MeanMax. 68.0104.92151.0 194.8455.0 >sd(data$MPG) [1] 8.401777 >sd(data$Displacement) [1] 104.9225
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
be 151.0. The standard deviation was 104.94. This shows that the data is almost close to normal distribution since there is no uch spread in the data. 2.Perform a bivariate linear regression analysis with two continuous variables. You shall: Set up the equation, interpret and comment on the result. Formulatehypothesesforthevariables.Concludethehypothesis.Interpretthe explained variance (R-squared). Comment on default errors and contexts. Answer The regression equation that we sought to estimate is given below; MPG=β0+β1(Displacement)+ε Where,β0is the coeffient of the intercept whileβ1is the coefficient of the displament andεis the error term(Tofallis, 2009). The following hypothesis was to be tested; H0: There is no significant relationship between MPG and Displacement HA: There is significant relationship between MPG and Displacement Results are presented below; >fit <- lm(MPG ~ Displacement) >summary(fit) # show results Call: lm(formula = MPG ~ Displacement) Residuals: Min1QMedian 3QMax -29.0354-2.7733- 0.35072.600719.0627 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)34.971798 0.56825461.54<2e- 16 *** Displacement -0.061200 0.002569-23.82<2e- 16 *** --- Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-value was for the displacement was found to be 0.000 (a value less than 5% level of significance), we therefore reject the null hypothesis and conclude thatthere is significantrelationshipbetweenMPGandDisplacement.Thecoefficientfor displacementwasfoundtobe-0.0612;thissuggeststhataunitincreasein displacement would result in reduction in the MPG by 0.0612. The R-squared value is given as 0.5841; this means that 58.41% of the variation in the miles per gallon (MPG) is explained by the displacement (explanatory variable) in the model. Close to 41% of the variation in MPG is explained by the error term (variables outside the model). The final regression model is thus; MPG=34.9718−0.0612(Displacement) The standard error for the explanatory variable (displacement) is iven as 0.0026; which tells us that the average distance of the data points from the fitted line is about 0..0026 displacement. 3.Perform a multiple linear regression analysis with at least three independent variables. You shall: Set up the equation, interpret and comment on the result. Formulate hypotheses for two of the variables in the model. concluding hypotheses. Interpret the explained variance (R-squared). Comment on standard errors and Confidence interval. Answer The regression equation that we sought to estimate is given below; MPG=β0+β1(Displacement)+β2(Weight)+β3(Acceleration)+ε Where,β0is the coeffient of the intercept whileβ1is the coefficient for the displament, β2is the coefficient for the weight,β3is the coefficient for the acceleration andεis the error term(Fisher, 2002). The following hypothesis were to be tested;
i)H0: There is no significant relationship between MPG and Displacement HA: There is significant relationship between MPG and Displacement ii)H0: There is no significant relationship between MPG and Weight HA: There is significant relationship between MPG and Weight iii)H0: There is no significant relationship between MPG and Acceleration HA: There is significant relationship between MPG and Acceleration Results are presented below; Thep-valuewasforthe displacement was found to be 0.1625 (a valuegreaterthan5%levelof significance),wetherefore fail to reject the null hypothesisandconcludethatthereisno significant relationshipbetweenMPGand Displacement. Thep-valuewasforthe weight was found to be 0.000(avaluelessthan5%levelof >fit <- lm(MPG ~ Displacement+Weight+A cceleration) >summary(fit) # show results Call: lm(formula = MPG ~ Displacement + Weight + Acceleration) Residuals: Min1QMedian 3QMax -31.4098-2.6495- 0.10212.738316.3955 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)40.0165421 2.195325318.228< 2e- 16 *** Displacement -0.0107202 0.0076615-1.399 0.1625 Weight-0.0062342 0.0008724-7.146 4.23e- 12 *** Acceleration0.2382217 0.11473852.076 0.0385 * --- Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.123 on 402 degrees of freedom Multiple R-squared: 0.631,Adjusted R- squared:0.6282 F-statistic: 229.1 on 3 and 402 DF,p-value: < 2.2e-16
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
significance), we therefore reject the null hypothesis and conclude thatthere is significant relationship between MPG and Weight. The p-value was for the Acceleration was found to be 0.0385 (a value less than 5% level of significance), we therefore reject the null hypothesis and conclude thatthere is significant relationship between MPG and Acceleration. The R-squared value is given as 0.631; this means that 63.1% of the variation in the miles per gallon (MPG) is explained by the three explanatory variables in the model (displacement, weight and acceleration). The final regression model is thus; MPG=40.0165−0.0107(Displacement)−0.0062(Weight)+0.2388(Acceleration) The standard error for the explanatory variable (displacement) is iven as 0.0077; which tells us that the average distance of the data points from the fitted line is about 0..0077 displacement. The standard error for the explanatory variable (weight) is iven as 0.0009; which tells us that the average distance of the data points from the fitted line is about 0..0009 weight. The standard error for the explanatory variable (acceleration) is iven as 0.1147; which tells us that the average distance of the data points from the fitted line is about 0..1147 acceleration. The confidence interval for the explaanatory variables also shows that two of the three expalantory variables are significant.
References Fisher, R. A. (2002). The goodness of fit of regression formulae, and the distribution of regression coefficients.Journal of the Royal Statistical Society, 85(4), 597–612. Tofallis, C. (2009). Least Squares Percentage Regression.Journal of Modern Applied Statistical Methods, 7, 526–534. Trochim, W. M. (2006). Descriptive statistics.Research Methods Knowledge Base.
Appendix data<-read.csv("C:\\Users\\310187796\\Desktop\\cars.csv") attach(data) str(data) #Frequency table for Cylinders counts1 <- table(Cylinders) barplot(counts1, main="Bar chart of Cylinders", xlab="Cylinder", col=c("grey","blue", "purple", "black", "green")) counts1 #Frequency table fro the Origin counts2 <- table(Origin)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser