This document provides R code and explanations for analyzing the Iris dataset. It covers topics such as variables and records in the dataset, creating subsets, calculating summary statistics, and generating boxplots.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Science Student Name: Instructor Name: Course Number: 24 April 2019
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1.Write R code to see how many variables and records are in dataset Iris. Answer The code is given above, where we can see that we have 5 variables with 150 records. 2.Write R code to read first 10 rows of each subset (setosa, versicolour, virginica). Answer The R codes are given below; # filter() the data for species setosa setosa <- filter(iris, Species == "setosa") head(setosa, n=10) # filter() the data for species versicolor versicolor <- filter(iris, Species == "versicolor") head(versicolor, n=10) # filter() the data for species virginica virginica <- filter(iris, Species == "virginica") head(virginica, n=10) The R outputs are given below; > str(iris) 'data.frame':150 obs. of5 variables: $ Sepal.Length: num5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 >setosa <- filter(iris, Species == "setosa") >head(setosa, n=10) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 15.13.5 1.40.2setosa 24.93.0 1.40.2setosa 34.73.2 1.30.2setosa 44.63.1 1.50.2setosa 55.03.6 1.40.2setosa 65.43.9 1.70.4setosa 74.63.4 1.40.3setosa 85.03.4 1.50.2setosa 94.42.9 1.40.2setosa 104.93.1 1.50.1setosa >versicolor <- filter(iris, Species == "versicolor") >head(versicolor, n=10) Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 17.03.2 4.71.4 versicolor 26.43.2 4.51.5 versicolor 36.93.1 4.91.5 versicolor 45.52.3 4.01.3 versicolor 56.52.8
3.Write R code and use tail() to view the last 15 rows in datasetIris. Answer The code is given as; tail(iris, 15 and the output is given below; 4.Write R code to show petal length of setosa, versicolour, virginica. Write R code to show which flower specie has shortest petal length. Discuss the result as an analysis in your report. (Explanation with steps to do: Write R code to show petal length of setosa, versicolour, virginica. Do not paste all result in your report as it will be too long. Paste some of them >tail(iris, 15) Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 1367.73.0 6.12.3 virginica 1376.33.4 5.62.4 virginica 1386.43.1 5.51.8 virginica 1396.03.0 4.81.8 virginica 1406.93.1 5.42.1 virginica 1416.73.1 5.62.4 virginica 1426.93.1 5.12.3 virginica 1435.82.7 5.11.9 virginica 1446.83.2 5.92.3 virginica 1456.73.3 5.72.5 virginica 1466.73.0 5.22.3 virginica 1476.32.5 5.01.9 virginica 1486.53.0 5.22.0 virginica 1496.23.4 5.42.3 virginica >virginica <- filter(iris, Species == "virginica") >head(virginica, n=10) Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 16.33.3 6.02.5 virginica 25.82.7 5.11.9 virginica 37.13.0 5.92.1 virginica 46.32.9 5.61.8 virginica 56.53.0 5.82.2 virginica 67.63.0 6.62.1 virginica 74.92.5 4.51.7 virginica 87.32.9 6.31.8 virginica 96.72.5 5.81.8 virginica 107.23.6 6.12.5 virginica
and you can write in the report how many records were shown. Next write R code to show which flower specie has shortest petal length. Again for too long result only paste few records retrieved. ) Discuss the result (how many records you retrieved) as an analysis in your report. Answer The codes to show the petal lengths of setosa, versicolour, virginica is shown below together with the output For the output we justpresented the first 15 cases of each and every flowerspecies. The code and the outputfor comparing the petal lengths for the different flowerspecies is given below; As can be seen, Setosa species has the shortest petal length (M = 1.46) while Virginica has the longest petal length (M = 5.55). 5.Write R code to create a new column that stores logical values for sepal.width greater than half of sepal.length. Answer 6.Write R code to generate summary of sepal length, sepal width, petal length, and petal width. The summary should include Minimum, Maximum, Mean, Median, Upper Quartile, and Lower Quartile. Discuss the result as an analysis in your report. You are required to interpret box plot. Box plot is a kind of graph and is used to show the shape of the distribution, its central value, and its variability. In a box plot: the ends of the >new_column <- mutate(iris, greater.half = Sepal.Width > 0.5 * Sepal.Length) >head(new_column) Sepal.Length Sepal.Width Petal.Length Petal.Width Species greater.half 15.13.5 1.40.2setosa TRUE 24.93.0 1.40.2setosa TRUE 34.73.2 1.30.2setosa TRUE 44.63.1 >irissetosa <- subset(iris$Petal.Length, Species == "setosa") >head(irissetosa, n=15) [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 > >irisVer <- subset(iris$Petal.Length, Species == "versicolor") >head(irisVer, n=15) [1] 4.7 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 > >irisVir <- subset(iris$Petal.Length, Species == "virginica") >head(irisVir, n=15) [1] 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0 5.1 >mean(irissetosa) [1] 1.462 >mean(irisVer) [1] 4.26 >mean(irisVir) [1] 5.552
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
box are the upper and lower quartiles, so the box spans the interquartile range. The median is marked by a vertical line inside the box. Answer The summary results in the table above shows that the average sepal length is 5.84 with a medium length of 5.80. The median and the mean for the sepal length are close to each suggesting that the distribution is close to normal for the sepal length. For the sepal width, petal length and petal width we can see there are some slight variations in the mean and the median values suggesting some kind of skewness in those variables. >summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Min.:4.300 Min.:2.000 Min.:1.000 Min.:0.100 1st Qu.:5.1001st Qu.:2.8001st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean:5.843 Mean:3.057 Mean:3.758 Mean:1.199 3rd Qu.:6.4003rd Qu.:3.3003rd Qu.:5.100 3rd Qu.:1.800 Max.:7.900 Max.:4.400 Max.:6.900 Max.:2.500
The above four boxplots represents the various variables. The boxplot for the sepal length shows that the data is close to normal distribution while the petal length and petal width are heavily skewed (left skewed). 7.Dataset has labels for each class. Write R code to generate boxplot to see the distribution of the values considering each class (setosa, versicolour, virginica). Discuss the result as an analysis in your report. Answer boxplot(Sepal.Length~Species, main="Boxplot for Sepal Length", col="aquamarine", data=iris) The boxplot above shows that the sepal length for the three different species vary in terms of distribution. For the setosa, we can see the distribution to be almost normally distributed while for the versicolor and virginica, we can see a slight skewness.
boxplot(Sepal.Width~Species, main="Boxplot for Sepal Width", col="cornsilk", data=iris) The boxplot above shows that the sepal width for the three different species vary in terms of distribution. For the virginica, we can see the distribution to be almost normally distributed while for the versicolor and setosa, we can see a slight skewness. boxplot(Petal.Length~Species, main="Boxplot for Petal Length", col="darkorange", data=iris) The boxplot above shows that the petal length for the three different species vary in terms of distribution. For the setosa, we can see the distribution to be almost normally distributed while for the versicolor and virginica, we can see a slight skewness. boxplot(Petal.Width~Species, main="Boxplot for Petal Width",
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
col="darkolivegreen1", data=iris) The boxplot above shows that the petal width for the three different species. As can be seen, the three species seem to not follow a normal distribution based on the looks of the boxplot. 8.Write R code to generate Histogram of Iris petal length. Discuss the result as an analysis in your report. Answer hist(iris$Petal.Length, main="Histogram for Petal Length", xlab="Petal Length", col="darkorange") The above histogram shows that the data on petal length is not normally distributed (since it doesn’t have a bell-shaped curve) but rather skewed. 9.Write R code to generate Scatter Graph of Iris Sepal Width versus Sepal Length by Species. Discuss the result as an analysis in your report. Answer
The above figure presents a scatter plot of petal width versus petal length. It is clear from the figure that a positive linear relationship exists between the two variables (petal width and petal length) and this applies for the different species. 10.Write R code to generate violin plot for Iris summary statistics. Use library (vioplot). A violin plot is a method of plotting numeric data. A Violin Plot is used to visualise the distribution of the data and its probability density. This chart is a combination of a Box Plot and a Density Plot that is rotated and placed on each side, to show the distribution shape of the data. Answer The above Violin Boxplot of the different Species clearly shows that Virginica species has highest median value in relation to the sepal length, the petal width and the petal length when compared against the Setosa and Versicolor species. In terms of the sepal width, the Setosa species has the highest median value.
11.Write R code to generate Scatterplot to see correlation for the variables for each different class. How one variable compares to others? Are these variables correlated? +1 means variables are correlated, -1 inversely correlated. Answer From the table above, we can see that a strong positive relationship exists between the petal width and the petal length (r = .963), there is also a strong positive relationship between sepal length and the petal length (r = .872). Another strong positive relationship exists between sepal length and petal width (r = .818). There is however, a weak negative relationship between sepal length and sepal width (r =-.118), a weak negative relationship between sepal width and petal width (r =-.366) and another weak negative relationship between sepal width and petal length (r =-.428). The scatter plot above further confirms the relationship status mentioned above. 12.Write R code to apply different colour to different classes in Scatterplot. Answer The plot below gives a scatter plot with different colour to different classes. >corr <- cor(iris[,1:4]) >round(corr,3) Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length1.000 -0.1180.872 0.818 Sepal.Width-0.118 1.000-0.428- 0.366 Petal.Length0.872 -0.4281.000 0.963 Petal.Width0.818 -0.3660.963 1.000
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
R Codes iris<-load("C:\\Users\\310187796\\Desktop\\iris.data") data(iris) attach(iris) #question 1 str(iris) #Question 2 install.packages("dplyr", dependencies=TRUE) library("dplyr") # filter() the data for species setosa setosa <- filter(iris, Species == "setosa") head(setosa, n=10) # This dispalys the first 10 rows # filter() the data for species versicolor versicolor <- filter(iris, Species == "versicolor") head(versicolor, n=10) # This dispalys the first 10 rows # filter() the data for species virginica virginica <- filter(iris, Species == "virginica") head(virginica, n=10) # This dispalys the first 10 rows #Question 3 tail(iris, 15) #Question 4 irissetosa <- subset(iris$Petal.Length, Species == "setosa") head(irissetosa, n=15) irisVer <- subset(iris$Petal.Length, Species == "versicolor") head(irisVer, n=15) irisVir <- subset(iris$Petal.Length, Species == "virginica") head(irisVir, n=15) mean(irissetosa) mean(irisVer) mean(irisVir) #Question 5 new_column <- mutate(iris, greater.half = Sepal.Width > 0.5 * Sepal.Length) head(new_column) #Question 6 summary(iris) par(mfrow=c(1,2)) boxplot(Sepal.Length, main="Sepal Length", col="aquamarine", data=iris)
boxplot(Sepal.Width, main="Sepal Width", col="cornsilk", data=iris) boxplot(Petal.Length, main="Petal Length", col="darkorange", data=iris) boxplot(Petal.Width, main="Petal Width", col="darkolivegreen1", data=iris) #Question 7 par(mfrow=c(1,1)) boxplot(Sepal.Length~Species, main="Boxplot for Sepal Length", col="aquamarine", data=iris) boxplot(Sepal.Width~Species, main="Boxplot for Sepal Width", col="cornsilk", data=iris) boxplot(Petal.Length~Species, main="Boxplot for Petal Length", col="darkorange", data=iris) boxplot(Petal.Width~Species, main="Boxplot for Petal Width", col="darkolivegreen1", data=iris) #Question 8 hist(iris$Petal.Length, main="Histogram for Petal Length", xlab="Petal Length", col="darkorange") #Question 9 plot(iris$Petal.Length, iris$Petal.Width, col=Species, pch=16, bg=c("red","green3","blue")[pch=unclass(iris$Species)], xlab="Petal Length", ylab="Petal Width", main="Scatter plot of Petal Width versus Petal Length") legend("topleft", legend=levels(Species), pch=16, col=unique(Species)) #Question 10 install.packages("ggplot2") install.packages("ggplot") install.packages("dplyr") install.packages("tidyr") library("ggplot2") library("ggplot") library("dplyr") library("tidyr") library("vioplot") gather(iris, Var, value, -Species) %>% ggplot(aes(Var, value))+ geom_violin(aes(fill = Species))+ facet_grid(~Species)+ theme(axis.text.x = element_text(angle = 90, vjust = .5))+ labs(x = "Measurements", y = "Length in cm", title = "Violin Boxplot for the different Species")+ geom_boxplot(width=0.1)
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.