This study material provides information on various aspects of Data Science, including the Iris dataset, subsets of flower species, petal length comparison, data visualization, correlation analysis, and more. It includes R code examples and explanations to help students understand key concepts in Data Science.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data science1 by [Student Name] Data Science Tutor: [Tutor Name] [Institutional Affliliation] [Department] [Date]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Data science2 Q1 Iris dataset contains data about three iris flower species setosa, virginica, and versicolor. The three species have four measured features on each: sepal length, sepal width, petal length, and petal width. The dataset consists of 50 samples from each of the three species(Alvarez- Castillo et al., 2016). library(datasets) This code loads the package datasets, attaches it on the search list and makes functions and data contained in the package available. str(iris) The code call iris data from package dataset. View(iris) The code view() help to view the dataset iris in a sheet-like in excel. There are 150 observations and 5 variables that is sepal length, sepal width, petal length, petal width, and species. Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies 15.13.51.40.2setosa 24.93.01.40.2setosa 34.73.21.30.2setosa
Data science3 Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies 44.63.11.50.2setosa 55.03.61.40.2setosa 65.43.91.70.4setosa 74.63.41.40.3setosa 85.03.41.50.2setosa Q2 To view the first ten observation of each subset of the flower species we need to extract the subsets first using the codes :
Data science6 (BÄ…ska, Pondel and Dudycz, 2019) Q3 The r code to view the last fifteen observations tail(iris,15) Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 1367.73.06.12.3 virginica 1376.33.45.62.4 virginica 1386.43.15.51.8 virginica 1396.03.04.81.8 virginica 1406.93.15.42.1 virginica 1416.73.15.62.4 virginica 1426.93.15.12.3 virginica 1435.82.75.11.9 virginica 1446.83.25.92.3 virginica 1456.73.35.72.5 virginica 1466.73.05.22.3 virginica 1476.32.55.01.9 virginica 1486.53.05.22.0 virginica 1496.23.45.42.3 virginica 1505.93.05.11.8 virginica (Caldwell, 2012) Q4
Data science7 R code to show petal length of setosa, versicolour and virginica. virginica$Petal.Length virginica$Petal.Length [1] 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 [20] 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 [39] 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1 setosa$Petal.Length There are 50 observations for the petal lengths. versicolor$Petal.Length There are 50 observations for the petal lengths To compare the petal length of each flower we can use the summary to draw conclusions. summary(virginica$Petal.Length) Min. 1st Qu. MedianMean 3rd Qu.Max. 4.5005.1005.5505.5525.8756.900 summary(setosa$Petal.Length) Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0001.4001.5001.4621.5751.900 summary(versicolor$Petal.Length) Min. 1st Qu. MedianMean 3rd Qu.Max.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Data science8 3.004.004.354.264.605.10 Comparing the minimum and maximum values of each species, setosa has the smallest petal length(min=1.000 and max=1.9). then versicolor has the smallest minimum and maximum petal length compared to virginica. Looking at the means setosa has the smallest mean compared to other species. Hence setosa has the smallest petal size(Grothe, 2012). Q5 new=mutate(iris, new=Sepal.Width>0.5*Sepal.Length) new Sepal.Length Sepal.Width Petal.Length Petal.WidthSpeciesnew 15.13.51.40.2setosa TRUE 24.93.01.40.2setosa TRUE 34.73.21.30.2setosa TRUE 44.63.11.50.2setosa TRUE 55.03.61.40.2setosa TRUE 65.43.91.70.4setosa TRUE 74.63.41.40.3setosa TRUE (Kabacoff, 2011)
Data science9 Q6 summary(iris) Sepal.LengthSepal.WidthPetal.LengthPetal.Width Min.:4.300Min.:2.000Min.:1.000Min.:0.100 1st Qu.:5.1001st Qu.:2.8001st Qu.:1.6001st Qu.:0.300 Median :5.800Median :3.000Median :4.350Median :1.300 Mean:5.843Mean:3.057Mean:3.758Mean:1.199 3rd Qu.:6.4003rd Qu.:3.3003rd Qu.:5.1003rd Qu.:1.800 Max.:7.900Max.:4.400Max.:6.900Max.:2.500 Species setosa:50 versicolor:50 virginica :50 The median for each measurement is different. The summary shows the lowest and highest values, the first and the third quartile. There is a slight difference between means and medians. Boxplot(iris)
Data science10 Maximum value excluding outliers Upper quartile, 25% of observations are above Median, 50% of observations are lower and upper Lower quartile 25% of observations are below The outliers, value that are outside the interquartile (Kabir et al., 2018) Q7 boxplot(versicolor[,1:4], main="Versicolor") boxplot(setosa[,1:4], main="Setosa") boxplot(virginica[,1:4], main="Virginica")
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Data science11 The box plot shows that setosa have low petal lengths when compared with other species. It also has large sepal widths compared to other species(Chan and Kogan, 2016). Versicolor and virginica have similar measurements. Versicolor has smaller petal length and petal width compared with virginica. Hence, it is concluded that the flower species have a unique measurement(Rojas, 2015). Q8 attach(iris) hist(Petal.Length)
Data science12 Q9 ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point() There is a no correlation between sepal length and sepal width, change values of sepal width does not affect sepal length(Tomat et al., 2019).
Data science14 Q11 pairs(iris[, 1:4]) There is a positive correlation amongst the variables since they seem to move in the same direction(Hatami-Marbini et al., 2017). Correlation= cor(iris[,1:4])
Data science15 Correlation Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length1.0000000 -0.11756980.87175380.8179411 Sepal.Width-0.11756981.0000000-0.4284401 -0.3661259 Petal.Length0.8717538 -0.42844011.00000000.9628654 Petal.Width0.8179411 -0.36612590.96286541.0000000 This is a correlation matrix for the four variables in the iris dataset. There is no correlation between sepal length and sepal width since there correlation(- 0.1175698) is close to 0. Sepal length is strongly correlated to both petal length and petal width since the correlation(0.871 and 0.817) is close to 1(Kostenko, 2016). Sepal width has a weak negative correlation with petal length and petal width. Petal length has a perfect positive correlation petal length and a weak negative correlation with sepal width(Jocelyn E. Behm et al., 2013).
Data science16 Q12 pairs(iris[, 1:4],col=iris[,5])
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Data science17 References Alvarez-Castillo, D., Ayriyan, A., Benic, S., Blaschke, D., Grigorian, H. and Typel, S., 2016. New class of hybrid EoS and Bayesian M - R data analysis.The European Physical Journal A - Hadrons and Nuclei, (3), p.1. Bąska, M., Pondel, M. and Dudycz, H., 2019. Identification of advanced data analysis in marketing: A systematic literature review.Journal of Economics & Management, 35(1), pp.18–39. Caldwell, R.J., 2012.Digital elevation models of Juneau and southeast Alaska : procedures, data sources and analysis / R. Jason Caldwell [and four others]. NOAA technical memorandum NESDIS NGDC: 53. Chan, D.Y. and Kogan, A., 2016. Data Analytics: Introduction to Using Analytics in Auditing.Journal of Emerging Technologies in Accounting, 13(1), pp.121–140. Grothe, P.R., 2012.Digital elevation models of the Virgin Islands : procedures, data sources and analysis / Pamela R. Grothe [and six others]. NOAA technical memorandum NESDIS NGDC: 55. Hatami-Marbini, A., Agrell, P., Fukuyama, H., Gholami, K. and Khoshnevis, P., 2017. The role of multiplier bounds in fuzzy data envelopment analysis.Annals of Operations Research, 250(1), pp.249–276. Jocelyn E. Behm, Devin A. Edmonds, Jason P. Harmon and Anthony R. Ives, 2013. Multilevel statistical models and the analysis of experimental data.Ecology, 94(7), p.1479. Kabacoff, R., 2011.R in action : data analysis and graphics with R. Manning. Kabir, A.I., Karim, R., Newaz, S. and Hossain, M.I., 2018. The Power of Social Media Analytics: Text Analytics Based on Sentiment Analysis and Word Clouds on R.Informatica Economica, 22(1), pp.25–38. Kostenko, A., 2016. Graphical Data Analysis with R.Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(3), pp.880–880.
Data science18 Rojas, D., 2015.Data Analysis and Business Modeling with Excel 2013. Professional Expertise Distilled. Birmingham: Packt Publishing. Tomat, L., Bratec, M., Minor, K. and Budler, M., 2019. Daily Deals in the Mediterranean Region: A Data Analytics Approach.E-review of Tourism Research, 16(2/3), pp.13–23.