BSB123 Data Analysis: GPA, SES & Other Student Performance Factors

Verified

Added on  2023/06/03

|14
|1588
|320
Report
AI Summary
This report investigates factors influencing the academic performance, measured by GPA, of first-year science students. It uses hypothesis testing and regression analysis to assess the impact of variables like high school scores (science, English, math), ATAR, socioeconomic status (SES), and gender. The analysis reveals that parental education level significantly impacts GPA, while gender does not show a significant difference. Regression models highlight the importance of high school math scores and parental education, but also indicate a need for additional variables like student intelligence or study habits to improve the model's predictive power. The report concludes by suggesting improvements for future models to better understand and predict student GPA. Desklib provides access to similar solved assignments and study tools for students.
Document Page
DATA ANALYSIS
STUDENT NAME/ID
[Pick the date]
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Task 1
Box plot and t- tests
(a) Requisite five number summary for GPA score and side by side box plot of GPA for female and male
students is highlighted below.
The distribution of GPA scores for male and female is asymmetric and also has skew. It is evident from
the presence of outliers at the higher level and lower end of the data. The outliers represents that there are
some of the science students who have extraordinarily high or low GPA scores. Further, as the nature of
the plot is skewed, hence, median would be considered as correct measures of central tendency rather
than mean value. Also, inter quartile range (IQR) would be considered as correct measures of dispersion
rather than standard deviation value. It is because mean and standard deviation both are significantly
affected due to the extreme values (outliers).
(b) Hypothesis test
1
Document Page
It is apparent that the two samples GPA for male students and GPA for female students are two
independent variables. Further, standard deviation of population is unknown and hence, t test for two
sample data with unequal variance is suitable for the hypothesis test.
The two tailed p value from the above test is 0.1972 which is more than significance level (Assuming
5%). As a result of this, insufficient evidence is present to reject null hypothesis. Hence, it can be said that
average GPA of male students is not different from average GPA of female students.
2(a) Hypothesis test
It is apparent that the two samples with regards to GPA of students from parents having different highest
qualification are two independent variables. Further, standard deviation of population is unknown and
hence, t test for two sample data with unequal variance is suitable for the hypothesis test.
2
Document Page
The one tailed p value from the above test is 0.1757 which is more than significance level (Assuming
5%). As a result of this, insufficient evidence is present to cause rejection in the null hypothesis. Hence, it
can be said that average GPA of students (parent education post-graduation) is not different from average
GPA of students (parent education under-graduation).
(b)Hypothesis test
It is apparent that the two samples with regards to GPA of students from parents having different highest
qualification are two independent variables. Further, standard deviation of population is unknown and
hence, t test for two sample data with unequal variance is suitable for the hypothesis test.
The one tailed p value from the above test is zero which is lesser than significance level (Assuming 5%).
As a result of this, rejection of null hypothesis and acceptance of the alternative hypothesis would be
3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
facilitated. Hence, it can be said that average GPA of students (parent education under-graduation) is
higher than the average GPA of students (parent education secondary or less qualification).
Task 2
Regression Analysis
(3) Correlation matrix for the numeric variables
The following key observations have been drawn from the correlation matrix.
Highest correlation coefficient =0.444 (HS_MATH and GPA)
Second highest correlation coefficient = 0.424 (ATAR and GPA)
Lowest correlation coefficient =0.304 (HS_ENG and GPA)
(4)(i) HS_SCI does act as predictor variable for dependent variable GPA by taking into cognizance that
the strength of the association is moderate. Moreover, the regression model would be statistically
significant because the respective slope coefficient is significant and would not be assumed to be zero.
4
Document Page
(ii) HS_ENG does act as predictor variable for dependent variable GPA by taking into cognizance that
the strength of the association is moderate. Moreover, the regression model would be statistically
significant because the respective slope coefficient is significant and would not be assumed to be zero.
(iii) HS_MATH does act as predictor variable for dependent variable GPA by taking into cognizance that
the strength of the association is moderate.. Moreover, the regression model would be statistically
significant because the respective slope coefficient is significant and would not be assumed to be zero.
(iv) ATAR does act as predictor variable for dependent variable GPA by taking into cognizance that the
strength of the association is moderate. Moreover, the regression model would be statistically significant
because the respective slope coefficient is significant and would not be assumed to be zero.
5
Document Page
5) Regression model
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7
Document Page
8
Document Page
6) Regression output (for step 5)
Interpretation
HS_SCI: Slope indicates that when there is one unit variation in the science score of student then
GPA would be changed by 0.08 units and the direction of change would be the same for both
variables.
HS_ ENG: Slope indicates that when there is one unit variation in the English score of student then
GPA would be changed by 0.04 units and the direction of change would be the same for both
variables.
9
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
HS_MATH: Slope indicates that when there is one unit variation in the Math score of student then
GPA would be changed by 0.25 units and the direction of change would be the same for both
variables.
PARENT EDUC: Slope indicates that when there is one unit variation in qualification of parent of the
student then GPA would be changed by 0.28 units and the direction of change would be the same for
both variables.
GENDER: Slope indicates that GPA would be greater than 0.11 for a female student as compared
with the male student while taking the other factors constant.
Hypothesis test for testing the significance of slope
Null Hypothesis : β=0
Alternative Hypothesis: β 0
Null hypothesis would only be rejected when the p value is lower than significance level.
Significance level (Alpha) =0.05
It is apparent from the above table that only HS_MATH and PARENT EDUC slopes are statistically
significant while the rest slopes are statistically insignificant.
6) The step 6 ATAR coefficient is negative which is quite surprising as it would be obvious to expect that
a better performance at ATAR would imply GPA to be better. The p value related to the slope of this
variable (ATAR) in step 6 model amounts to 0.91. This is significantly greater than 0.05 and thereby the
ATAR variable slope can be taken as zero. Hence, it can be concluded that ATAR addition to the
regression model does not lead to improvement of the fit of the model.
Task 3
10
Document Page
Summary Report
The influence of SES is apparent from the hypothesis test (Figure 1) where it is highlighted that the
average GPA of students with parents having highest qualification as under graduation is higher in
comparison to those parents who have lesser qualification. However, gender does not influence the
average performance of the given students since the average performance of the two genders is not
significantly different (Figure 2).
In relation to regression analysis, as an individual moves from the initial step (i.e. Step 1) to Step 3, the
HS_SCI slope coefficient’s numerical value and significance progressively keeps on declining. This
clearly highlights that the other independent variables that are added tend to be more significant in
comparison than HS_SCI. The use of ATAR as a single predictor variable does indicate significance
(p<0.05). However, the inclusion of other variables clearly highlight that ATAR is not a significant
predictor variable (Figure 3).
From the various regression models worked out in 6 steps in Task 2, the best choice seems to be Step 4
model owing to the highest adjusted R2 value amongst all the models. For construction of a new model
which is even superior, the choice of predictor variables would be HS_MATH and PARENT_EDUC.
However, taking Step 4 as the model of choice, there is poor fit as only 23.22% of the GPA variations are
explained by the collective variation of independent variables (Figure 4).
It is apparent that even the best model is a bad fit and hence there is need for insertion of additional
independent variables which must have significant influence on GPA. Possible choices in this regards
could include variables such as student intelligence level (measured by IQ), class attendance, hours of
study. The insertion of these variables could potentially lead to better estimates of GPA and hence a
strong case exists for introduction of these.
Figure 1: Hypothesis Testing related to performance of students with parents having under graduation as
qualification in comparison to secondary and less qualification.
11
chevron_up_icon
1 out of 14
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]