Statistics: Analysis of the dataset and the outliers, Regression

Verified

Added on  2020/03/16

|5
|338
|96
Report
AI Summary
This report presents a statistical analysis of a given dataset, focusing on regression analysis and outlier identification. The report begins with a discussion of the five-number summary and a box plot, demonstrating the skewness of the data and the presence of outliers. The analysis then proceeds to examine normal probability plots to assess the distribution of residuals, concluding that the data, after removing outliers, generally follows a normal distribution. The report also includes residual plots for each independent variable, confirming the linearity assumption. The report highlights the importance of understanding the data distribution, identifying outliers, and ensuring the assumptions of regression analysis are met for accurate interpretation and prediction. The report concludes with references to the sources used.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
(c) Table 1: Five number summary of age
Minimum 1
First Quartile 2
Median 13
Third quartile 45
Maximum 55
Figure 1:Box plot with whiskers of the variable age
0 20 40 60
age
In the above box plot, all the outliers lie after the second whisker. This means the distribution
is positively skewed (Holcomb, 2017). It also means that the variable has mean greater than
the median. This means that the right tail is longer and the bulk of the distribution is
concentrated on the left side of the curve ( Holcomb, 2017). This shows that majority are of
the age group 1 to 20 years. The outliers lie above the age of 30 years.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
(n)
Figure 2:Normal probability plot
0 20 40 60 80 100 120
0
50000
100000
150000
200000
250000
300000
Normal Probability Plot
Sample Percentile
Y
Figure 2 shows the normal probability plots from the regression of price against the
independent variables age, automation, power, petrol, kilometer, hatchback, convertible and
sedan. The normal probability plots show that there are only two outliers while the
residuals are normally distributed. Thus it shows an approximate linear distribution with the
exception of the two outliers. Hence, the error terms follow a normal distribution after
removing the two outliers from the data set. Thus, the normality and linearity assumptions
are satisfied as shown by the normal probability plots.
The residual plots for each of the independent variables are shown in the figures below.
Residual plots
0 10 20 30 40 50 60
-100000
0
100000
200000
300000
ageResidual Plot
age
Residuals
Document Page
0 0.2 0.4 0.6 0.8 1 1.2
-100000
0
100000
200000
300000
automationResidual Plot
automation
Residuals
0
20000
40000
60000
80000
100000
120000
140000
160000
-100000
0
100000
200000
kilometerResidual Plot
kilometer
Residuals
0 50 100 150 200 250 300 350 400 450 500
-100000
0
100000
200000
300000
power Residual Plot
power
Residuals
0 0.2 0.4 0.6 0.8 1 1.2-100000
0
100000
200000
300000
PetrolResidual Plot
petrol
Residuals
Document Page
0 0.2 0.4 0.6 0.8 1 1.2-100000
0
100000
200000
300000
hatchback Residual Plot
hatchback
Residuals
0 0.2 0.4 0.6 0.8 1 1.2
-100000
0
100000
200000
300000
convertible Residual Plot
convertible
Residuals
0 0.2 0.4 0.6 0.8 1 1.2
-100000
0
100000
200000
300000
sedanResidual Plot
sedan
Residuals
In the case of all the independent variables, residual plots disperse randomly around the
horizontal axis. Hence, it can be concluded that linearity assumption is satisfied by the data
given (Keith, 2006). The relationship is linear based on the residual plot.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
References
Holcomb Z C(2017): Fundamentals of Descriptive Statistics, Routledge: USA.
Keith T(2006): Multiple Regression and Beyond, Pearson Education, Boston .
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]