MAS183 Statistical Data Analysis: Assignment 1 - Semester 2, 2019

Verified

Added on  2022/09/18

|7
|1262
|24
Homework Assignment
AI Summary
This document presents a comprehensive solution to a statistics assignment (MAS183) covering topics in statistical data analysis. The assignment involves analyzing data from the BodyfatMales.csv and CornChicks.csv datasets using R. It includes tasks such as creating histograms, identifying outliers, describing data distributions, calculating descriptive statistics (mean, median, standard deviation), performing regression analysis to estimate body fat percentage, interpreting R-squared values, and generating boxplots to compare weight gain distributions. The solution demonstrates the application of statistical concepts to real-world data and provides detailed explanations and interpretations of the results. Data types and variable classifications are also covered. The solution showcases the use of R for generating plots and performing calculations, addressing all the requirements outlined in the assignment brief. The assignment aims to assess the student's ability to apply statistical methods and interpret data in the context of the provided datasets.
Document Page
Statistics
Student Name:
Instructor Name:
Course Number:
20th August 2019
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1. [13 marks]
The data file BodyfatMales.csv contains data on 252 adult men. The variables are as
follows: Case identifies the individual
Fat percentage of body weight made up of fat
Age age in years
Weight body weight in kilograms
Height height in centimetres
Neck, Chest, etc are circumference measurements in centimetres
(a) Provide a histogram of the men’s body weights. [5]
Answer
Figure 1: Histogram of the men’s body weights
(b) Identify any outliers in the distribution of body weights by stating their Case
numbers, and explain why you think they are outliers. [2]
Answer
Yes there are outliers in the dataset. Some of the case numbers that are believed to be
outliers are 119.16 and 164.69. The case numbers are believed to be outliers because
they are far apart from the other case numbers.
(c) Ignoring any outliers, describe the weight distribution in terms of location, spread and
shape. [3]
Answer
Ignoring outliers, we can see that the distribution of the data is right skewed (longer
tail to the right). The location (also referred to as the center) is seen to be
approximately 80 since half of the weights are to the left and the other half are to the
right. In terms of the shape, the graph shows a unimodal distribution since the graph
Document Page
has only one peak. Lastly, in terms of spread, it can be observed that the data is
widely spread out as the range is approximately 60 (ignoring the outliers).
(d) Including any outliers, provide the mean, median and sample standard deviation for
the distribution. [3]
Answer
Weight
Mean 81.14468
Median 80.045
Standard Deviation 13.32844
2. [17 marks]
Body fat percentage is difficult to measure accurately, so it would be handy to have an
estimate based on something that is easily measured. We will use the data in
BodyfatMales.csv to investigate whether abdominal circumference provides a reasonable
basis for estimating body fat percentage. The relevant variables are Abdomen and Fat.
(a) For our purposes, which variable is the predictor and which is the response, and why?
[2]
Answer
The predictor variable is the abdominal circumference; this is because it is used to
estimate the response variable (body fat percentage)
The response variable is the fat percentage; this is because it is the variable that is
being estimated.
(b) Provide a scatterplot of the data. Circle any points that appear to be outliers. [5]
Answer
Figure 2: A scatter plot representing Fat percentage vs Abdominal Circumference
Document Page
(c) Based only on the graph in part (b), briefly describe the relationship between the
variables in terms of direction, shape and strength. [3]
Answer
The above plot shows that there is a strong positive linear relationship between
abdominal circumference and fat percentage.
(d) Using the least-squares line of best fit, calculate the predicted body fat percentage for
a man with an abdominal circumference of exactly 1 metre. [2]
Answer
> model
Call:
lm(formula = Fat ~ Abdomen)
Coefficients:
(Intercept) Abdomen
-35.1966 0.5849
> summary(model)
Call:
lm(formula = Fat ~ Abdomen)
Residuals:
Min 1Q Median 3Q Max
-17.6257 -3.4672 0.0111 3.1415 11.9754
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -35.19661 2.46229 -14.29 <2e-16
Abdomen 0.58489 0.02643 22.13 <2e-16
(Intercept) ***
Abdomen ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.514 on 250 degrees of freedom
Multiple R-squared: 0.6621, Adjusted R-squared: 0.6608
F-statistic: 489.9 on 1 and 250 DF, p-value: < 2.2e-16
The regression is given as follows;
Fat =35.1966+0.5849( Abdominal)
With an abdominal circumference of 1 metre (100 cm) we estimate the fat percentage
as follows;
Fat =35.1966+0.5849 ( 100 )
¿35.1966+58.49
¿ 23.2934
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
(e) By hand, draw the least-squares line on your graph from part (b) (“by hand” includes
using pen and ruler, or manual positioning of a line drawn using software.) Briefly
describe how you worked out where to position the line. [2]
Answer
In order to draw a least squares line, an eyeball method was used by balancing out an
equal number of points below and above the plotted points. The graph is then
presented as shown in figure 3 below..
Figure 3: A scatter plot representing Fat percentage vs Abdominal Circumference
(f) Obtain the R2 value for the regression and interpret it in the context of the data. [3]
Answer
The R-Squared value for the regression is 0.6621; this value means that 66.21% of the
variation in the dependent variable (fat percentage) is explained by the independent
variable (abdominal circumference) in the model.
3. [10 marks]
Ordinary wheat doesn’t have as much of the amino acid lysine in it as animals may need
in their feed, so plant scientists have developed varieties of wheat with higher levels of
lysine. To test a high-lysine wheat variety, 20 one-day-old male chickens were fed a diet
that included highlysine wheat. Another group of 20 (a ‘control’ group) received an
identical diet except that the wheat was an ordinary low-lysine variety. The weights of
the chicks were measured at the start of the experiment, and again after 21 days.
The data are in the file CornChicks.csv. The variables are:
Group "Control" for the control group, or "HiLysine" for the high-lysine group.
WtGain Weight gain in grams (g).
(a) Provide side-by-side boxplots of the two distributions of weight gain. [5]
Document Page
Answer
Figure 4: A Side-by-side boxplots of the two distributions of weight gain
(b) Briefly compare the two distributions in terms of location, spread and shape. [3]
Answer
Control
We can see that the distribution of the weight gain for the control is non-symmetrical
(right skewed). The location (also referred to as the center) is seen to be slightly over
350. In terms of spread, it can be observed that the data is widely spread out as the
range is approximately 200.
HiLysine
We can see that the distribution of the weight gain for the HilYsine is approximately
symmetrical. The location (also referred to as the center) is seen to be slightly over
400. In terms of spread, it can be observed that the data is less widely spread out as
the range is less than 150.
(c) For each data variable used in this question, classify its data type in terms of
numerical/categorical, discrete/continuous and nominal/ordinal as appropriate. [2]
Answer
The two variables are classified as follows;
Variable Numerical/
categorical
Discrete/continuous Nominal/ordinal
Group Categorical N/A Nominal
Document Page
WtGain Numerical Continuous N/A
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]