Data Analysis Report for Student GPA
VerifiedAdded on 2023/06/04
|14
|1450
|447
AI Summary
This report analyzes the GPA of male and female students, parent qualification, and predictors of GPA using correlation and regression analysis. Find out the significance of variables and recommendations for improving the regression model.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
DATA ANALYSIS
STUDENT NAME/ID
[Pick the date]
STUDENT NAME/ID
[Pick the date]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Task 1
(a) Box plot to represent the GPA of male and female students
The box plot is not symmetric rather skewed which is confirmed from the outliers located at both
ends. The outliers indicate that there are students who have subsequently scored high GPA. As
the data is skewed and comprise outliers and hence, median and inter quartile range (IQR) are
the right option for central tendency and variance measures of GPA scores.
(b) Hypothesis testing for checking the claim that GPA of male students differs from the GPA of
female students.
H0 : μGPA(Female )−μGPA (male )=0
H1 : μGPA(Female)−μGPA ( male ) ≠ 0
two sample t test for unequal variance is termed as appropriate test to check the claim of
hypothesis test.
1
(a) Box plot to represent the GPA of male and female students
The box plot is not symmetric rather skewed which is confirmed from the outliers located at both
ends. The outliers indicate that there are students who have subsequently scored high GPA. As
the data is skewed and comprise outliers and hence, median and inter quartile range (IQR) are
the right option for central tendency and variance measures of GPA scores.
(b) Hypothesis testing for checking the claim that GPA of male students differs from the GPA of
female students.
H0 : μGPA(Female )−μGPA (male )=0
H1 : μGPA(Female)−μGPA ( male ) ≠ 0
two sample t test for unequal variance is termed as appropriate test to check the claim of
hypothesis test.
1
Assuming the level of significance as 5%
The alternative hypothesis sign is indicative of the aspect that two tailed p value would be taken
into account. The p value (0.1972) is more than significance level (0.05) which implies that
sufficient statistically evidence is available to not reject the null hypothesis. Null hypothesis is
not rejected and thereby, the alternative hypothesis would not be accepted and hence, GPA of
male students does not differ from the GPA of female students.
2(a) Hypothesis testing for checking the claim that GPA of students with parent qualification
post-graduation is higher than the GPA of students with parent qualification under-graduation is
conducted below.
H0 : μGPA(Post graduate)−μGPA ( Under graduate ) =0
H1 : μGPA(Post graduate)> μGPA ( Under graduate )
Two sample t test for unequal variance is termed as appropriate test to check the claim of
hypothesis test.
2
The alternative hypothesis sign is indicative of the aspect that two tailed p value would be taken
into account. The p value (0.1972) is more than significance level (0.05) which implies that
sufficient statistically evidence is available to not reject the null hypothesis. Null hypothesis is
not rejected and thereby, the alternative hypothesis would not be accepted and hence, GPA of
male students does not differ from the GPA of female students.
2(a) Hypothesis testing for checking the claim that GPA of students with parent qualification
post-graduation is higher than the GPA of students with parent qualification under-graduation is
conducted below.
H0 : μGPA(Post graduate)−μGPA ( Under graduate ) =0
H1 : μGPA(Post graduate)> μGPA ( Under graduate )
Two sample t test for unequal variance is termed as appropriate test to check the claim of
hypothesis test.
2
Assuming the level of significance as 5%
The alternative hypothesis sign is indicative of the aspect that one tailed p value would be taken
into account. The p value (0.1757) is more than significance level (0.05) which implies that
sufficient statistically evidence is available to not reject the null hypothesis. Null hypothesis is
not rejected and thereby, the alternative hypothesis would not be accepted and hence, GPA of
students with parent qualification post-graduation is not higher than the GPA of students with
parent qualification under-graduation.
(b)Hypothesis testing for checking the claim that GPA of students with parent qualification
under-graduation is higher than the GPA of students with parent qualification secondary or
below is conducted below.
H0 : μGPA(Post graduate)−μGPA ( Secondary∨below ) =0
H1 : μGPA(Post graduate)> μGPA ( Secondary∨below )
Two sample t test for unequal variance is termed as appropriate test to check the claim of
hypothesis test.
3
The alternative hypothesis sign is indicative of the aspect that one tailed p value would be taken
into account. The p value (0.1757) is more than significance level (0.05) which implies that
sufficient statistically evidence is available to not reject the null hypothesis. Null hypothesis is
not rejected and thereby, the alternative hypothesis would not be accepted and hence, GPA of
students with parent qualification post-graduation is not higher than the GPA of students with
parent qualification under-graduation.
(b)Hypothesis testing for checking the claim that GPA of students with parent qualification
under-graduation is higher than the GPA of students with parent qualification secondary or
below is conducted below.
H0 : μGPA(Post graduate)−μGPA ( Secondary∨below ) =0
H1 : μGPA(Post graduate)> μGPA ( Secondary∨below )
Two sample t test for unequal variance is termed as appropriate test to check the claim of
hypothesis test.
3
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Assuming the level of significance as 5%
The alternative hypothesis sign is indicative of the aspect that one tailed p value would be taken
into account. The p value (0.00) is lower than significance level (0.05) which implies that
sufficient statistically evidence is available to reject the null hypothesis and thereby, the
alternative hypothesis would be accepted and hence, GPA of students with parent qualification
under-graduation is higher than the GPA of students with parent qualification secondary or
below.
Task 2
(3) The correlation matrix as computed using Excel is highlighted as follows.
4
The alternative hypothesis sign is indicative of the aspect that one tailed p value would be taken
into account. The p value (0.00) is lower than significance level (0.05) which implies that
sufficient statistically evidence is available to reject the null hypothesis and thereby, the
alternative hypothesis would be accepted and hence, GPA of students with parent qualification
under-graduation is higher than the GPA of students with parent qualification secondary or
below.
Task 2
(3) The correlation matrix as computed using Excel is highlighted as follows.
4
The correlational matrix is indicative of the fact that HS_MATH has the maximum amount of
correlation with GPA. The next in line on count of correlation would be ATA. The lowest
correlation is witnessed between HS_ENG and GPA.
(4)(i) One of the predictors of GPA is HS_SCI which is reflected in both correlation analysis
along with regression analysis. The relevant regression analysis is indicated below.
(ii) One of the predictors of GPA is HS_ENG which is reflected in both correlation analysis
along with regression analysis. The relevant regression analysis is indicated below.
(iii) One of the predictors of GPA is HS_MATH which is reflected in both correlation analysis
along with regression analysis. The relevant regression analysis is indicated below.
5
correlation with GPA. The next in line on count of correlation would be ATA. The lowest
correlation is witnessed between HS_ENG and GPA.
(4)(i) One of the predictors of GPA is HS_SCI which is reflected in both correlation analysis
along with regression analysis. The relevant regression analysis is indicated below.
(ii) One of the predictors of GPA is HS_ENG which is reflected in both correlation analysis
along with regression analysis. The relevant regression analysis is indicated below.
(iii) One of the predictors of GPA is HS_MATH which is reflected in both correlation analysis
along with regression analysis. The relevant regression analysis is indicated below.
5
(iv) One of the predictors of GPA is ATAR which is reflected in both correlation analysis along
with regression analysis. The relevant regression analysis is indicated below.
5) Regression model
6
with regression analysis. The relevant regression analysis is indicated below.
5) Regression model
6
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
7
8
6) Regression output (for step 5)
Interpretation
HS_SCI: The slope is indicative that GPA change observed would be 0.08 when the
underlying student would secure a unit higher score in science at high school.
.
HS_ ENG: The slope is indicative that GPA change observed would be 0.04 when the
underlying student would secure a unit higher score in English at high school.
9
Interpretation
HS_SCI: The slope is indicative that GPA change observed would be 0.08 when the
underlying student would secure a unit higher score in science at high school.
.
HS_ ENG: The slope is indicative that GPA change observed would be 0.04 when the
underlying student would secure a unit higher score in English at high school.
9
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
HS_MATH: The slope is indicative that GPA change observed would be 0.35 when the
underlying student would secure a unit higher score in Math at high school.
PARENT EDUC: The slope is indicative that GPA change observed would be 0.88 when the
underlying student would have a parent whose highest qualification is improved by one level.
GENDER: In comparison with the male student, a female student would tend to score on
average 0.11 GPA higher assuming that the other parameters remain constant across genders.
The relevant hypothesis in regards to given hypothesis test are enumerated below.
The decision rule is that the independent variables for which the p value does not exceed the
significance level would be significant. These are essentially HS_Math and Parent_Educ which
are considered significant here.
6) The ATAR coefficient is negative in Step 6 which is quite strange as higher ATAR would be
likely to be associated with GPA. However, it is noteworthy that ATAR slope lacks significance
and therefore is not a significant parameter.
Task 3
Summary Report
10
underlying student would secure a unit higher score in Math at high school.
PARENT EDUC: The slope is indicative that GPA change observed would be 0.88 when the
underlying student would have a parent whose highest qualification is improved by one level.
GENDER: In comparison with the male student, a female student would tend to score on
average 0.11 GPA higher assuming that the other parameters remain constant across genders.
The relevant hypothesis in regards to given hypothesis test are enumerated below.
The decision rule is that the independent variables for which the p value does not exceed the
significance level would be significant. These are essentially HS_Math and Parent_Educ which
are considered significant here.
6) The ATAR coefficient is negative in Step 6 which is quite strange as higher ATAR would be
likely to be associated with GPA. However, it is noteworthy that ATAR slope lacks significance
and therefore is not a significant parameter.
Task 3
Summary Report
10
The SES has sizable influence on the average student GPA as reflected from the inferential testing used
(Figure 1). This is confirmed from the fact that differences in performance are observed between students
having differing academic qualification. But, no significant influence is observed for gender as an
independent variable since for male and female, the average GPA do not vary (Figure 2).
With regards to regression, the key observation is that the movement from Step 1 to Step 3 leads to
continued insignificance for HS_SCI slope coefficient and lowering of magnitude. This makes a strong
case for insertion of additional independent variables which need to be inserted for increasing the
predictability of the regression model. Also, when ATAR is used as a standalone independent variable,
then significance is noticed (i.e. p <0.05). But as exhibited from other models, as there is insertion of
other independent variables also, significance is lost by ATAR (Figure 3).
In order to choose the best possible model from the available choice, the adjusted R2 would be used as a
suitable indicator. The Step 4 is the most suitable model from the available choices. If a more desirable
model is required from the given variables, it makes sense to form a new regression model which
comprises of only two independent variables which have proved their significance (i.e. HS_MATH and
PARENT_EDUC). Taking the optimum model as Step 4, the fit of the model is quite poor considered that
the independent variables jointly can offer explanation to 23.22% of changes in GPA (Figure 4).
Considering the poor fit of the existing best fir regression model it is imperative to consider new predictor
variables which can be added to the regression model so as to enhance the predictive power and lower the
standard error. Some of the potential options in this aspect would be study hours, attendance in class,
engagement in online modules and general IQ level of students. Since some of these variables could lead
to GPA estimate improvement, hence these need to be strongly inserted.
Figure 1
11
(Figure 1). This is confirmed from the fact that differences in performance are observed between students
having differing academic qualification. But, no significant influence is observed for gender as an
independent variable since for male and female, the average GPA do not vary (Figure 2).
With regards to regression, the key observation is that the movement from Step 1 to Step 3 leads to
continued insignificance for HS_SCI slope coefficient and lowering of magnitude. This makes a strong
case for insertion of additional independent variables which need to be inserted for increasing the
predictability of the regression model. Also, when ATAR is used as a standalone independent variable,
then significance is noticed (i.e. p <0.05). But as exhibited from other models, as there is insertion of
other independent variables also, significance is lost by ATAR (Figure 3).
In order to choose the best possible model from the available choice, the adjusted R2 would be used as a
suitable indicator. The Step 4 is the most suitable model from the available choices. If a more desirable
model is required from the given variables, it makes sense to form a new regression model which
comprises of only two independent variables which have proved their significance (i.e. HS_MATH and
PARENT_EDUC). Taking the optimum model as Step 4, the fit of the model is quite poor considered that
the independent variables jointly can offer explanation to 23.22% of changes in GPA (Figure 4).
Considering the poor fit of the existing best fir regression model it is imperative to consider new predictor
variables which can be added to the regression model so as to enhance the predictive power and lower the
standard error. Some of the potential options in this aspect would be study hours, attendance in class,
engagement in online modules and general IQ level of students. Since some of these variables could lead
to GPA estimate improvement, hence these need to be strongly inserted.
Figure 1
11
Figure 2
Figure 3
Figure 4
12
Figure 3
Figure 4
12
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
13
1 out of 14
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.