Biological Statistics: Analysis of Animal Body and Brain Weights
VerifiedAdded on 2023/06/07
|11
|1787
|143
Homework Assignment
AI Summary
This assignment focuses on analyzing the 'Animals' dataset from the MASS package in R. The student begins by determining the data's dimensions and describing its content, which includes body and brain weights of various land animals. The assignment progresses through several analytical steps: creating and interpreting scatter plots of the data, applying logarithmic transformations to the data and re-plotting, and identifying and removing outliers (dinosaurs) to refine the analysis. For each scatter plot, the student comments on the relationships between the variables and fits a line of best fit to the data. The R codes used for data manipulation, plotting, and regression analysis are provided, along with interpretations of the results, demonstrating a clear understanding of statistical concepts and R programming skills. The final analysis involves comparing the original and transformed data, and discussing the impact of outlier removal on the resulting graphs and relationships.

Running head: BIOLOGICAL STATISTICS 1
Biological Statistics
[Name of Student]
[Institutional Affiliation]
Biological Statistics
[Name of Student]
[Institutional Affiliation]
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOLOGICAL STATISTICS 2
Using the R program version 3.5.1
Download the data ‘Animals’ from the ‘MASS’ package. This data is special. The first column
identifies the animals involved in the study. This column is not a variable in the traditional sense.
It is a column of labels.
R codes for downloading the data “Animals” from the ‘MASS’
> library(MASS)
> data(Animals)
1. What is the dimension of the data? (2points)
The Animals data frame has 3 columns and 28 rows. The three columns are;
i. Land anima
ii. Body weight (in kilograms)
iii. Brain weight (in grams)
The data frame is presented as follows (From R console window)
> class(Animals)
[1] "data.frame"
> sapply(Animals,class)
body brain
"numeric" "numeric"
>
2. Describe the data.
The data contains average body and brain weights for 28 species of land animals. The body
weights are recorded in kilograms while the brain weights are recorded in grams.
Summary of the data from the R console window is presented as follows;
> library(MASS)
> data(Animals)
> summary(Animals)
body brain
Min. : 0.02 Min. : 0.40
Using the R program version 3.5.1
Download the data ‘Animals’ from the ‘MASS’ package. This data is special. The first column
identifies the animals involved in the study. This column is not a variable in the traditional sense.
It is a column of labels.
R codes for downloading the data “Animals” from the ‘MASS’
> library(MASS)
> data(Animals)
1. What is the dimension of the data? (2points)
The Animals data frame has 3 columns and 28 rows. The three columns are;
i. Land anima
ii. Body weight (in kilograms)
iii. Brain weight (in grams)
The data frame is presented as follows (From R console window)
> class(Animals)
[1] "data.frame"
> sapply(Animals,class)
body brain
"numeric" "numeric"
>
2. Describe the data.
The data contains average body and brain weights for 28 species of land animals. The body
weights are recorded in kilograms while the brain weights are recorded in grams.
Summary of the data from the R console window is presented as follows;
> library(MASS)
> data(Animals)
> summary(Animals)
body brain
Min. : 0.02 Min. : 0.40

BIOLOGICAL STATISTICS 3
1st Qu.: 3.10 1st Qu.: 22.23
Median : 53.83 Median : 137.00
Mean : 4278.44 Mean : 574.52
3rd Qu.: 479.00 3rd Qu.: 420.00
Max. :87000.00 Max. :5712.00
> dim(Animals)
[1] 28 2
> summary(Animals$body)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02 3.10 53.83 4278.44 479.00 87000.00
> summary(Animals$brain)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.40 22.23 137.00 574.52 420.00 5712.00
>
3. Show the top ten rows of the data. (2 Points)
R command;
>Animals[1:10, ]
Top ten rows of the data
body brain
Mountain beaver 1.35 8.1
Cow 465.00 423.0
Grey wolf 36.33 119.5
Goat 27.66 115.0
Guinea pig 1.04 5.5
Dipliodocus 11700.00 50.0
Asian elephant 2547.00 4603.0
Donkey 187.10 419.0
1st Qu.: 3.10 1st Qu.: 22.23
Median : 53.83 Median : 137.00
Mean : 4278.44 Mean : 574.52
3rd Qu.: 479.00 3rd Qu.: 420.00
Max. :87000.00 Max. :5712.00
> dim(Animals)
[1] 28 2
> summary(Animals$body)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02 3.10 53.83 4278.44 479.00 87000.00
> summary(Animals$brain)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.40 22.23 137.00 574.52 420.00 5712.00
>
3. Show the top ten rows of the data. (2 Points)
R command;
>Animals[1:10, ]
Top ten rows of the data
body brain
Mountain beaver 1.35 8.1
Cow 465.00 423.0
Grey wolf 36.33 119.5
Goat 27.66 115.0
Guinea pig 1.04 5.5
Dipliodocus 11700.00 50.0
Asian elephant 2547.00 4603.0
Donkey 187.10 419.0
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

BIOLOGICAL STATISTICS 4
Horse 521.00 655.0
Potar monkey 10.00 115.0
4. Show the scatter plot of the data with x-axis ‘body’ and y-axis ‘brain.’ Identify as many
data points as you can. Comment on the plot. (4 points)
Fig 1. Scatter plot of the data (body Vs brain).
Comment;
Based on the scatter plot above, most of the data shows an uphill patter however there are some
outliers in the data. It can therefore be suggested that there exist a positive relationship between
the brain weight and the body weight. Looking at the graph. It is awful. No connection! Most of
the data is sitting in one corner. There seems to be some outliers in the data. From the data set
Horse 521.00 655.0
Potar monkey 10.00 115.0
4. Show the scatter plot of the data with x-axis ‘body’ and y-axis ‘brain.’ Identify as many
data points as you can. Comment on the plot. (4 points)
Fig 1. Scatter plot of the data (body Vs brain).
Comment;
Based on the scatter plot above, most of the data shows an uphill patter however there are some
outliers in the data. It can therefore be suggested that there exist a positive relationship between
the brain weight and the body weight. Looking at the graph. It is awful. No connection! Most of
the data is sitting in one corner. There seems to be some outliers in the data. From the data set
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOLOGICAL STATISTICS 5
these animals are; Brachiosaurus, African elephants and Asian elephants since they have
relatively high amount of weight.
R commands used for Plotting
> library(MASS)
> data(Animals)
> plot(Animals$body,Animals$brain,xlab="body",ylab="brain",main="body vs brain",
pch=20,cex.main=2,col="blue")
>
5. Show the scatter plot of the data after the logarithmic transformation. Identify as many
points as you can. Comment on the plot. Obtain the line of best fit. Draw the line on the
scatter plot. Make the graph as informative as possible.
Fig 2. Scatterplot of the data after logarithmic transformations (points=20)
Comment;
Looking at the graph after logarithmic transformations, most of the data are now aligned at the
centre of the plot. Also, on the plot above, the data is distributed in the uphill manner and by
fitting the regression line, the graph suggest that there is a positive linear relationship between
these animals are; Brachiosaurus, African elephants and Asian elephants since they have
relatively high amount of weight.
R commands used for Plotting
> library(MASS)
> data(Animals)
> plot(Animals$body,Animals$brain,xlab="body",ylab="brain",main="body vs brain",
pch=20,cex.main=2,col="blue")
>
5. Show the scatter plot of the data after the logarithmic transformation. Identify as many
points as you can. Comment on the plot. Obtain the line of best fit. Draw the line on the
scatter plot. Make the graph as informative as possible.
Fig 2. Scatterplot of the data after logarithmic transformations (points=20)
Comment;
Looking at the graph after logarithmic transformations, most of the data are now aligned at the
centre of the plot. Also, on the plot above, the data is distributed in the uphill manner and by
fitting the regression line, the graph suggest that there is a positive linear relationship between

BIOLOGICAL STATISTICS 6
the two variables. The regression line fitted makes the data to be evenly distributed on either
sides of the line of bets fit. However, there exists some few outliers in the data. Some outliers
still exist in the data but their location are reduced to the centre as compared to their distribution
before the logarithmic transformations.
R command used in plotting scatter (With no line of best fit)
library(MASS)
> data(Animals)
> plot(log(Animals$body),log(Animals$brain),pch=20,cex.main=1,col="red")
>
Fig 4. Scatterplot of the data after logarithmic transformations with regression line (with
pch=20)
R commands used for plotting Scatter plot and output (With Line of best fit);
the two variables. The regression line fitted makes the data to be evenly distributed on either
sides of the line of bets fit. However, there exists some few outliers in the data. Some outliers
still exist in the data but their location are reduced to the centre as compared to their distribution
before the logarithmic transformations.
R command used in plotting scatter (With no line of best fit)
library(MASS)
> data(Animals)
> plot(log(Animals$body),log(Animals$brain),pch=20,cex.main=1,col="red")
>
Fig 4. Scatterplot of the data after logarithmic transformations with regression line (with
pch=20)
R commands used for plotting Scatter plot and output (With Line of best fit);
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

BIOLOGICAL STATISTICS 7
> library(MASS)
> data(Animals)
> plot(Animals$body, Animals$brain, pch = 20, col = "red", xlab = "body", ylab = "brain")
> plot(log(Animals$body), log(Animals$brain), pch = 20, col = "red", xlab = "body", ylab =
"brain")
6. Remove the dinosaurs. Show the scatter plot of the resultant data after the logarithmic
transformation. Identify as many points as you can. Comment on the plot. Obtain the line
of best fit. Draw the line on the scatter plot. Make the graph as informative as possible.
9 points
The dinosaurs in the data are identified using the command;
R commands
>library(MASS)
>data(Animals)
>identify(Animals$body, Animals$brain, labels = rownames(Animals))
After the removal of the outliers from the data, the following plots are obtained;
Fig 5. Scatter plot after the removal of the outliers (After logarithmic transformations without
line of best fit)
> library(MASS)
> data(Animals)
> plot(Animals$body, Animals$brain, pch = 20, col = "red", xlab = "body", ylab = "brain")
> plot(log(Animals$body), log(Animals$brain), pch = 20, col = "red", xlab = "body", ylab =
"brain")
6. Remove the dinosaurs. Show the scatter plot of the resultant data after the logarithmic
transformation. Identify as many points as you can. Comment on the plot. Obtain the line
of best fit. Draw the line on the scatter plot. Make the graph as informative as possible.
9 points
The dinosaurs in the data are identified using the command;
R commands
>library(MASS)
>data(Animals)
>identify(Animals$body, Animals$brain, labels = rownames(Animals))
After the removal of the outliers from the data, the following plots are obtained;
Fig 5. Scatter plot after the removal of the outliers (After logarithmic transformations without
line of best fit)
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOLOGICAL STATISTICS 8
Fig 6. Scatter plot after the removal of the outliers (After logarithmic transformations without
line of best fit)
Comments
By examining the scatter plot above, the possible outliers (Brachiosaurus, African elephants and
Asian elephants) have been identified and removed. The resulting plot do not therefore contain
any outliers that can lead to bias in the results.
Fig 6. Scatter plot after the removal of the outliers (After logarithmic transformations without
line of best fit)
Comments
By examining the scatter plot above, the possible outliers (Brachiosaurus, African elephants and
Asian elephants) have been identified and removed. The resulting plot do not therefore contain
any outliers that can lead to bias in the results.

BIOLOGICAL STATISTICS 9
R commands used
> library(MASS)
> data(Animals)
>
plot(log(Animals$body),log(Animals$brain),ylim=c(0,8),xlim=c(0,8),pch=20,cex.main=1,col="
blue")
>
Fig 6. Scatter plot with regression line after the removal of the outliers
R commands
> library(MASS)
> data(Animals)
R commands used
> library(MASS)
> data(Animals)
>
plot(log(Animals$body),log(Animals$brain),ylim=c(0,8),xlim=c(0,8),pch=20,cex.main=1,col="
blue")
>
Fig 6. Scatter plot with regression line after the removal of the outliers
R commands
> library(MASS)
> data(Animals)
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

BIOLOGICAL STATISTICS 10
> plot(Animals$body, Animals$brain, pch = 20, col = "red", xlab = "body", ylab = "brain")
> plot(log(Animals$body), log(Animals$brain), pch = 20, ylim=c(0,8),xlim=c(0,8),col =
"green", xlab = "body", ylab = "brain")
> Animals1 <- lm(log(brain) ~ log(body), data = Animals)
> abline(Animals1, col = "red", lwd = 2)
7. Comment on the graphs in Questions five and six above (2
points)
With regard to the scatter plots obtained in question 5 and 6 above, the scatters are distributed in
an uphill manner and by fitting the line of the best after the removal of the outliers, a the
regression line runs from up to top. This indicates that there exist a much positive relationship
between brain weight of the animals and body weight of the 28 animals under consideration.
Also, from this it can be asserted that with an increase in body weight, the weight of the brain
also increases (positive correlation exists between the two variables).
References
Rousseeuw, P.J. (1987) Robust Regression and Outlier Detection. Wiley, p. 57.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S-PLUS. Third
Edition. Springer.
> plot(Animals$body, Animals$brain, pch = 20, col = "red", xlab = "body", ylab = "brain")
> plot(log(Animals$body), log(Animals$brain), pch = 20, ylim=c(0,8),xlim=c(0,8),col =
"green", xlab = "body", ylab = "brain")
> Animals1 <- lm(log(brain) ~ log(body), data = Animals)
> abline(Animals1, col = "red", lwd = 2)
7. Comment on the graphs in Questions five and six above (2
points)
With regard to the scatter plots obtained in question 5 and 6 above, the scatters are distributed in
an uphill manner and by fitting the line of the best after the removal of the outliers, a the
regression line runs from up to top. This indicates that there exist a much positive relationship
between brain weight of the animals and body weight of the 28 animals under consideration.
Also, from this it can be asserted that with an increase in body weight, the weight of the brain
also increases (positive correlation exists between the two variables).
References
Rousseeuw, P.J. (1987) Robust Regression and Outlier Detection. Wiley, p. 57.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S-PLUS. Third
Edition. Springer.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOLOGICAL STATISTICS 11
Appendix 1: R codes for Scatterplots and line of best fits.
library(MASS)
data(Animals)
plot(Animals$body, Animals$brain, pch = 20, col = "red", xlab = "Body Weight KG", ylab =
"Brain Weight gms")
identify(Animals$body, Animals$brain, labels = rownames(Animals))
plot(log(mammals$body), log(mammals$brain), pch = 20, col = "red", xlab = "ln of Body
Weight KG", ylab = "ln of Brain Weight gms", main =
"Scatter Plot + Line of Best Fit")
identify(log(mammals$body), log(mammals$brain), labels = rownames(mammals))
mammals1 <- lm(log(brain) ~ log(body), data = mammals)
summary(mammals1)
lm(formula = log(brain) ~ log(body), data = mammals)
Ln(brain) = 2.13479 + 0.75169*ln(body)
abline(mammals1, col = "blue", lwd = 2)
Appendix 1: R codes for Scatterplots and line of best fits.
library(MASS)
data(Animals)
plot(Animals$body, Animals$brain, pch = 20, col = "red", xlab = "Body Weight KG", ylab =
"Brain Weight gms")
identify(Animals$body, Animals$brain, labels = rownames(Animals))
plot(log(mammals$body), log(mammals$brain), pch = 20, col = "red", xlab = "ln of Body
Weight KG", ylab = "ln of Brain Weight gms", main =
"Scatter Plot + Line of Best Fit")
identify(log(mammals$body), log(mammals$brain), labels = rownames(mammals))
mammals1 <- lm(log(brain) ~ log(body), data = mammals)
summary(mammals1)
lm(formula = log(brain) ~ log(body), data = mammals)
Ln(brain) = 2.13479 + 0.75169*ln(body)
abline(mammals1, col = "blue", lwd = 2)
1 out of 11
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.