Data Science Study Material

Verified

Added on  2023/01/19

|18
|2019
|80
AI Summary
This study material provides information on various aspects of Data Science, including the Iris dataset, subsets of flower species, petal length comparison, data visualization, correlation analysis, and more. It includes R code examples and explanations to help students understand key concepts in Data Science.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Data science 1
by [Student Name]
Data Science
Tutor: [Tutor Name]
[Institutional Affliliation]
[Department]
[Date]

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data science 2
Q1
Iris dataset contains data about three iris flower species setosa, virginica, and versicolor. The
three species have four measured features on each: sepal length, sepal width, petal length, and
petal width. The dataset consists of 50 samples from each of the three species (Alvarez-
Castillo et al., 2016).
library(datasets)
This code loads the package datasets, attaches it on the search list and makes functions and
data contained in the package available.
str(iris)
The code call iris data from package dataset.
View(iris)
The code view() help to view the dataset iris in a sheet-like in excel. There are 150
observations and 5 variables that is sepal length, sepal width, petal length, petal width, and
species.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
Document Page
Data science 3
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
Q2
To view the first ten observation of each subset of the flower species we need to extract
the subsets first using the codes :
Document Page
Data science 4
virginica=filter(iris, Species=="virginica")
virginica
head(virginica,10)
head(virginica,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 6.3 3.3 6.0 2.5 virginica
2 5.8 2.7 5.1 1.9 virginica
3 7.1 3.0 5.9 2.1 virginica
4 6.3 2.9 5.6 1.8 virginica
5 6.5 3.0 5.8 2.2 virginica
6 7.6 3.0 6.6 2.1 virginica
7 4.9 2.5 4.5 1.7 virginica
8 7.3 2.9 6.3 1.8 virginica
9 6.7 2.5 5.8 1.8 virginica
10 7.2 3.6 6.1 2.5 virginica
setosa=filter(iris, Species=="setosa")
setosa
head(setosa,10)
head(setosa,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data science 5
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
versicolor=filter(iris, Species=="versicolor")
versicolor
head(versicolor)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 7.0 3.2 4.7 1.4 versicolor
2 6.4 3.2 4.5 1.5 versicolor
3 6.9 3.1 4.9 1.5 versicolor
4 5.5 2.3 4.0 1.3 versicolor
5 6.5 2.8 4.6 1.5 versicolor
6 5.7 2.8 4.5 1.3 versicolor
7 6.3 3.3 4.7 1.6 versicolor
8 4.9 2.4 3.3 1.0 versicolor
9 6.6 2.9 4.6 1.3 versicolor
10 5.2 2.7 3.9 1.4 versicolor
Document Page
Data science 6
(Bąska, Pondel and Dudycz, 2019)
Q3
The r code to view the last fifteen observations
tail(iris,15)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
136 7.7 3.0 6.1 2.3 virginica
137 6.3 3.4 5.6 2.4 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
(Caldwell, 2012)
Q4
Document Page
Data science 7
R code to show petal length of setosa, versicolour and virginica.
virginica$Petal.Length
virginica$Petal.Length
[1] 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9
[20] 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5
[39] 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1
setosa$Petal.Length
There are 50 observations for the petal lengths.
versicolor$Petal.Length
There are 50 observations for the petal lengths
To compare the petal length of each flower we can use the summary to draw conclusions.
summary(virginica$Petal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.500 5.100 5.550 5.552 5.875 6.900
summary(setosa$Petal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.400 1.500 1.462 1.575 1.900
summary(versicolor$Petal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data science 8
3.00 4.00 4.35 4.26 4.60 5.10
Comparing the minimum and maximum values of each species, setosa has the smallest petal
length(min=1.000 and max=1.9). then versicolor has the smallest minimum and maximum petal length
compared to virginica. Looking at the means setosa has the smallest mean compared to other species.
Hence setosa has the smallest petal size (Grothe, 2012).
Q5
new=mutate(iris, new=Sepal.Width>0.5*Sepal.Length)
new
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa TRUE
2 4.9 3.0 1.4 0.2 setosa TRUE
3 4.7 3.2 1.3 0.2 setosa TRUE
4 4.6 3.1 1.5 0.2 setosa TRUE
5 5.0 3.6 1.4 0.2 setosa TRUE
6 5.4 3.9 1.7 0.4 setosa TRUE
7 4.6 3.4 1.4 0.3 setosa TRUE
(Kabacoff, 2011)
Document Page
Data science 9
Q6
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
The median for each measurement is different.
The summary shows the lowest and highest values, the first and the third quartile.
There is a slight
difference between
means and
medians.
Boxplot(iris)
Document Page
Data science 10
Maximum value excluding outliers
Upper quartile, 25% of observations are above
Median, 50% of observations are lower and upper
Lower quartile 25% of observations are below
The outliers, value that are outside the interquartile
(Kabir et al., 2018)
Q7
boxplot(versicolor[,1:4], main="Versicolor")
boxplot(setosa[,1:4], main="Setosa")
boxplot(virginica[,1:4], main="Virginica")

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data science 11
The box plot shows that setosa have low petal lengths when compared with other species. It
also has large sepal widths compared to other species (Chan and Kogan, 2016). Versicolor
and virginica have similar measurements. Versicolor has smaller petal length and petal width
compared with virginica. Hence, it is concluded that the flower species have a unique
measurement (Rojas, 2015).
Q8
attach(iris)
hist(Petal.Length)
Document Page
Data science 12
Q9
ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()
There is a no correlation between sepal length and sepal width, change values of sepal width
does not affect sepal length (Tomat et al., 2019).
Document Page
Data science 13
Q10
library(vioplot)
attach(iris)
vioplot(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
names=c("Sep.Len","Sep.Wid","Pet.Len","Pet.Wid"), col="grey")

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data science 14
Q11
pairs(iris[, 1:4])
There is a positive correlation amongst the variables since they seem to move in the same
direction (Hatami-Marbini et al., 2017).
Correlation= cor(iris[,1:4])
Document Page
Data science 15
Correlation
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
This is a correlation matrix for the four variables in the iris dataset.
There is no correlation between sepal length and sepal width since there correlation(-
0.1175698) is close to 0.
Sepal length is strongly correlated to both petal length and petal width since the
correlation(0.871 and 0.817) is close to 1 (Kostenko, 2016).
Sepal width has a weak negative correlation with petal length and petal width.
Petal length has a perfect positive correlation petal length and a weak negative correlation
with sepal width (Jocelyn E. Behm et al., 2013).
Document Page
Data science 16
Q12
pairs(iris[, 1:4],col=iris[,5])

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data science 17
References
Alvarez-Castillo, D., Ayriyan, A., Benic, S., Blaschke, D., Grigorian, H. and Typel, S., 2016.
New class of hybrid EoS and Bayesian M - R data analysis. The European Physical Journal
A - Hadrons and Nuclei, (3), p.1.
Bąska, M., Pondel, M. and Dudycz, H., 2019. Identification of advanced data analysis in
marketing: A systematic literature review. Journal of Economics & Management, 35(1),
pp.18–39.
Caldwell, R.J., 2012. Digital elevation models of Juneau and southeast Alaska : procedures,
data sources and analysis / R. Jason Caldwell [and four others]. NOAA technical
memorandum NESDIS NGDC: 53.
Chan, D.Y. and Kogan, A., 2016. Data Analytics: Introduction to Using Analytics in
Auditing. Journal of Emerging Technologies in Accounting, 13(1), pp.121–140.
Grothe, P.R., 2012. Digital elevation models of the Virgin Islands : procedures, data sources
and analysis / Pamela R. Grothe [and six others]. NOAA technical memorandum NESDIS
NGDC: 55.
Hatami-Marbini, A., Agrell, P., Fukuyama, H., Gholami, K. and Khoshnevis, P., 2017. The
role of multiplier bounds in fuzzy data envelopment analysis. Annals of Operations Research,
250(1), pp.249–276.
Jocelyn E. Behm, Devin A. Edmonds, Jason P. Harmon and Anthony R. Ives, 2013.
Multilevel statistical models and the analysis of experimental data. Ecology, 94(7), p.1479.
Kabacoff, R., 2011. R in action : data analysis and graphics with R. Manning.
Kabir, A.I., Karim, R., Newaz, S. and Hossain, M.I., 2018. The Power of Social Media
Analytics: Text Analytics Based on Sentiment Analysis and Word Clouds on R. Informatica
Economica, 22(1), pp.25–38.
Kostenko, A., 2016. Graphical Data Analysis with R. Journal of the Royal Statistical Society:
Series A (Statistics in Society), 179(3), pp.880–880.
Document Page
Data science 18
Rojas, D., 2015. Data Analysis and Business Modeling with Excel 2013. Professional
Expertise Distilled. Birmingham: Packt Publishing.
Tomat, L., Bratec, M., Minor, K. and Budler, M., 2019. Daily Deals in the Mediterranean
Region: A Data Analytics Approach. E-review of Tourism Research, 16(2/3), pp.13–23.
1 out of 18
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]