logo

Analysis of Cars Dataset using Rstudio

   

Added on  2023-06-03

11 Pages2068 Words267 Views
 | 
 | 
 | 
Computer Programing Rstudio
Student name:
Student number:
Lecturer name:
31st October 2018
Analysis of Cars Dataset using Rstudio_1

Task 1
Online data on cars was used for this report. The link to the dataset is given below;
https://perso.telecom-paristech.fr/eagan/class/igr204/data/cars.csv
1. Create two frequency tables. Interpret and comment on the result.
Answer
Frequency table for the Cylinders
As can be seen, most of the vehicles had 4 cylinders (n = 207) while very few had 5
cylinders (n = 3).
Figure 1: Bar chart of cylinders
Frquency table for the Origin
Most of the cars in the sample came from US (n = 254) while the second highest werre
from Japan (n = 79) and the rest (n = 73) came from Europe (Trochim, 2006).
> counts1
Cylinders
3 4 5 6 8
4 207 3 84 108
> counts2
Origin
Europe Japan US
73 79 254
Analysis of Cars Dataset using Rstudio_2

2. Create a cross table where you use two categorical variables. Formulate hypotheses to
the table and calculate the Chi-squared and Cramers V for the table. Interpret and
comment on the result.
Answer
The two selected variables are;
Origin and Cylinders
The following hypothesis was tested;
H0: There is no significant association between Cylinder and country of origin.
HA: There is no significant association between Cylinder and country of origin.
Results of the Chi-Square test
From the table, it can be seen that the p-value 0.000 is less than the .05 significance level, we
therfore do reject the null hypothesis and conclude that that there is strong evidence of
significant association between Cylinder and country of origin.
Cramer’s V
For the Cramer’s V we have the results given in the table below.
> mytable # print table
Origin
Cylinders Europe Japan
US
3 0 4 0
4 66 69 72
5 3 0 0
6 4 6 74
8 0 0 108
> chisq.test(mytable)
Pearson's Chi-squared test
data: mytable
X-squared = 186.6048, df = 8, p-
value < 2.2e-16
Analysis of Cars Dataset using Rstudio_3

Symmetric Measures
Value Approx. Sig.
Nominal by Nominal Phi .678 .000
Cramer's V .479 .000
N of Valid Cases 406
The value of Cramer’s V was found to be 0.479 which shows a moderate association between
the variables (Origin and Cylinders).
Task 2
In this task you will work with continuous and categorical variables.
You are free to choose data sets and variables yourself, either from data sets available
in Canvas or from your own source. Siter the data set you use in the task.
1. Select at least two continuous variables. Find average, standard deviation, minimum
and maximum values for the variables. Interpreters and comments result.
Answer
The average Miles per gallon (MPG) was found to be 23.05 with the minimum and
maximum being 0.00 and 46.60 respectively while the median value was found to be
22.35. The standard deviation was 8.40. This shows that the data is almost close to
normal distribution.
On the other hand, the average displacement was found to be 194.8 with the minimum
and maximum being 68.0 and 455.0 respectively while the median value was found to
> summary(data$MPG)
Min. Stdev. Median
Mean Max.
0.00 8.40 22.35
23.05 46.60
>
summary(data$Displace
ment)
Min. Stdev. Median
Mean Max.
68.0 104.92 151.0
194.8 455.0
> sd(data$MPG)
[1] 8.401777
> sd(data$Displacement)
[1] 104.9225
Analysis of Cars Dataset using Rstudio_4

End of preview

Want to access all the pages? Upload your documents or become a member.