Data Analysis: Box plot, t-tests, and Regression Analysis
Verified
Added on  2023/06/03
|14
|1588
|320
AI Summary
This report presents the results of box plot, t-tests, and regression analysis conducted on GPA scores of male and female students. The report includes a summary of the findings, correlation matrix, and regression output.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
DATA ANALYSIS STUDENT NAME/ID [Pick the date]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Task 1 Box plot and t- tests (a)Requisite five number summary for GPA score and side by side box plot of GPA for female and male students is highlighted below. The distribution of GPA scores for male and female is asymmetric and also has skew. It is evident from the presence of outliers at the higher level and lower end of the data. The outliers represents that there are some of the science students who have extraordinarily high or low GPA scores. Further, as the nature of the plot is skewed, hence, median would be considered as correct measures of central tendency rather than mean value. Also, inter quartile range (IQR) would be considered as correct measures of dispersion rather than standard deviation value. It is because mean and standard deviation both are significantly affected due to the extreme values (outliers). (b)Hypothesis test 1
It is apparent that the two samples GPA for male students and GPA for female students are two independent variables. Further, standard deviation of population is unknown and hence, t test for two sample data with unequal variance is suitable for the hypothesis test. The two tailed p value from the above test is 0.1972 which is more than significance level (Assuming 5%). As a result of this, insufficient evidence is present to reject null hypothesis. Hence, it can be said that average GPA of male students is not different from average GPA of female students. 2(a) Hypothesis test It is apparent that the two samples with regards to GPA of students from parents having different highest qualification are two independent variables. Further, standard deviation of population is unknown and hence, t test for two sample data with unequal variance is suitable for the hypothesis test. 2
The one tailed p value from the above test is 0.1757 which is more than significance level (Assuming 5%). As a result of this, insufficient evidence is present to cause rejection in the null hypothesis. Hence, it can be said that average GPA of students (parent education post-graduation) is not different from average GPA of students (parent education under-graduation). (b)Hypothesis test It is apparent that the two samples with regards to GPA of students from parents having different highest qualification are two independent variables. Further, standard deviation of population is unknown and hence, t test for two sample data with unequal variance is suitable for the hypothesis test. The one tailed p value from the above test is zero which is lesser than significance level (Assuming 5%). As a result of this, rejection of null hypothesis and acceptance of the alternative hypothesis would be 3
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
facilitated. Hence, it can be said that average GPA of students (parent education under-graduation) is higher than the average GPA of students (parent education secondary or less qualification). Task 2 Regression Analysis (3) Correlation matrix for the numeric variables The following key observations have been drawn from the correlation matrix. Highest correlation coefficient =0.444 (HS_MATH and GPA) Second highest correlation coefficient = 0.424 (ATAR and GPA) Lowest correlation coefficient =0.304 (HS_ENG and GPA) (4)(i) HS_SCI does act as predictor variable for dependent variable GPA by taking into cognizance that the strength of the association is moderate. Moreover, the regression model would be statistically significant because the respective slope coefficient is significant and would not be assumed to be zero. 4
(ii) HS_ENG does act as predictor variable for dependent variable GPA by taking into cognizance that the strength of the association is moderate. Moreover, the regression model would be statistically significant because the respective slope coefficient is significant and would not be assumed to be zero. (iii) HS_MATH does act as predictor variable for dependent variable GPA by taking into cognizance that the strength of the association is moderate.. Moreover, the regression model would be statistically significant because the respective slope coefficient is significant and would not be assumed to be zero. (iv) ATAR does act as predictor variable for dependent variable GPA by taking into cognizance that the strength of the association is moderate. Moreover, the regression model would be statistically significant because the respective slope coefficient is significant and would not be assumed to be zero. 5
5) Regression model 6
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
6) Regression output (for step 5) Interpretation ï‚·HS_SCI: Slope indicates that when there is one unit variation in the science score of student then GPA would be changed by 0.08 units and the direction of change would be the same for both variables. ï‚·HS_ ENG: Slope indicates that when there is one unit variation in the English score of student then GPA would be changed by 0.04 units and the direction of change would be the same for both variables. 9
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
HS_MATH: Slope indicates that when there is one unit variation in the Math score of student then GPA would be changed by 0.25 units and the direction of change would be the same for both variables. PARENT EDUC: Slope indicates that when there is one unit variation in qualification of parent of the student then GPA would be changed by 0.28 units and the direction of change would be the same for both variables. GENDER: Slope indicates that GPA would be greater than 0.11 for a female student as compared with the male student while taking the other factors constant. Hypothesis test for testing the significance of slope NullHypothesis:β=0 AlternativeHypothesis:β≠0 Null hypothesis would only be rejected when the p value is lower than significance level. Significance level (Alpha) =0.05 It is apparent from the above table that only HS_MATH and PARENT EDUC slopes are statistically significant while the rest slopes are statistically insignificant. 6) The step 6 ATAR coefficient is negative which is quite surprising as it would be obvious to expect that a better performance at ATAR would imply GPA to be better. The p value related to the slope of this variable (ATAR) in step 6 model amounts to 0.91. This is significantly greater than 0.05 and thereby the ATAR variable slope can be taken as zero.Hence, it can be concluded that ATAR addition to the regression model does not lead to improvement of the fit of the model. Task 3 10
Summary Report The influence of SES is apparent from the hypothesis test (Figure 1) where it is highlighted that the average GPA of students with parents having highest qualification as under graduation is higher in comparison to those parents who have lesser qualification.However, gender does not influence the average performance of the given students since the average performance of the two genders is not significantly different (Figure 2). In relation to regression analysis, as an individual moves from the initial step (i.e. Step 1) to Step 3, the HS_SCI slope coefficient’s numerical value and significance progressively keeps on declining. This clearly highlights that the other independent variables that are added tend to be more significant in comparison than HS_SCI. The use of ATAR as a single predictor variable does indicate significance (p<0.05). However, the inclusion of other variables clearly highlight that ATAR is not a significant predictor variable (Figure 3). From the various regression models worked out in 6 steps in Task 2, the best choice seems to be Step 4 model owing to the highest adjusted R2value amongst all the models. For construction of a new model which is even superior, the choice of predictor variables would be HS_MATH and PARENT_EDUC. However, taking Step 4 as the model of choice, there is poor fit as only 23.22% of the GPA variations are explained by the collective variation of independent variables (Figure 4). It is apparent that even the best model is a bad fit and hence there is need for insertion of additional independent variables which must have significant influence on GPA. Possible choices in this regards could include variables such as student intelligence level (measured by IQ), class attendance, hours of study. The insertion of these variables could potentially lead to better estimates of GPA and hence a strong case exists for introduction of these. Figure 1: Hypothesis Testing related to performance of students with parents having under graduation as qualification in comparison to secondary and less qualification. 11
Figure 2: Hypothesis testing to compare performance of the two genders Figure 3: Regression result indicating insignificance of ATAR variable 12
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.