Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support

Analysis of Dataset: Descriptive Statistics and T-tests

Verified

Added on 2023/01/19

AI Summary

This document provides an analysis of a dataset, including descriptive statistics and t-tests. It explores the relationship between various variables such as age, gender, metropolitan background status, study mode, and depression levels. The results indicate no significant differences in most cases.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

1
Assessment Task 1: Analysis of dataset

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

2
1. A. The descriptive statistics of age are as follow,
Mean = 20.5 years
Standard deviation = 4.89 years
Minimum = 16 years
Maximum = 59 years
1. B. Frequency and percentage distribution of categorical variable of age has been
presented in Table 1.
Table 1: Frequency and percentage distribution of categorical variable of age
Note: Age of the participants was categorised according to the description in the table
2. Descriptive statistics for demographics variables has provided in Table 2.
Table 2: Descriptive Statistics for Demographic Variables of ACU

3
Note: Values are number (n) and percentage (%) in parenthesis, except for age which are Mean (SD)
Format Source: Keijzers et al. (2011)
Average age of students was 20.5 years with standard deviation of 4.89 years. Among
38681 participants, 28232 (73%) were females. Majority of the students N =20840
(53.9%) stayed at home during first year of study. Most of the students were domestic
students N = 32238 (83.3%). Prior enrolment details revealed that N = 27223 (70.4%)
students were from metropolitans. Single degree was undertaken by almost all of the
students N = 34620 (89.5%). Among the enrolled students, top three faculties where
students enrolled were Education (N = 15038, P = 38.9%), Health Sciences (N=11729,
P = 30.3%), and Arts and Sciences (N = 9004, P = 23.3%). Students’ enrolment
gradually increased from N = 3259 (8.4%) in 2005 to N = 6697 (17.3%) in 2012.

4
3. A. T-tests for mean of driver aggression, thrill seeking, and risk acceptance
with respect to gender (Kim, 2015).
Table 3: Descriptive statistics of response variables in accordance to gender
a. driver_agg = Driver aggression score, risk_accep = risk acceptance behaviour score, thrill = thrill
seeking behaviour score
b. N = Number of observations or samples for each gender
c. Mean = average aggression scores for genders
d. Std. Deviation = Standard Deviation of scores from mean
e. Std. Error Mean = standard deviation divided by square root of sample size (n) measuring standard
deviation of the sample mean.
Table 4: T-test statistics for assessing difference with respect to gender
f. F = Levene’s test statistics for equality of variances for the two genders.
g. Sig. = Maximum probability of equal variances for male and females.
h. t = t-test statistics for the difference between two means (males – females) =
difference in sample means−difference in population means
pooled sample s . d / √ sample size
i. Sig (2-tailed) = Maximum probability of equality of two means.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

5
j. Mean Difference = difference between means of aggression between male and females.
k. Std. Error Difference = standard error or sample mean of the differences between males and
females.
l. 95% confidence interval = 95% probability that the difference between two means of males and
females will lie between the lower and upper limits.
Table 3 indicated that average of the RTA factors, such as aggression, thrill, and
risk acceptance were almost same across the two genders. Table 4 indicated that
there is no statistical evidence to state any difference in these scores with respect to
gender of the participants.
b. T-tests for mean of driver aggression, thrill seeking, and risk acceptance
with respect to metropolitan background status.
Table 5: Descriptive statistics of response variables in accordance to metropolitan background status
a. driver_agg = Driver aggression score, risk_accep = risk acceptance behaviour score, thrill = thrill
seeking behaviour score
b. N = Number of observations or samples for metropolitan background status
c. Mean = average aggression scores for metropolitan background status
d. Std. Deviation = Standard Deviation of scores from mean
e. Std. Error Mean = standard deviation divided by square root of sample size (n) measuring
standard deviation of the sample mean.

6
Table 6: T-test statistics for assessing difference with respect to metropolitan background status
f. F = Levene’s test statistics for equality of variances for metropolitan background status
g. Sig. = Maximum probability of equal variances for metropolitan background status.
h. t = t-test statistics for the difference between two means (Metro – Non-metro) =
difference in sample means−difference in population means
pooled sample s . d / √ sample size
i. Sig (2-tailed) = Maximum probability of equality of two means.
j. Mean Difference = difference between means of aggression between two metropolitan background
status.
k. Std. Error Difference = standard error or sample mean of the differences between two metropolitan
background status.
l. 95% confidence interval = 95% probability that the difference between two means of metropolitan
background status will lie between the lower and upper limits.
Table 5 indicated that average of the RTA factors, such as aggression, thrill, and
risk acceptance were almost same across the two metropolitan background status.
Table 6 indicated that there is no statistical evidence to state any difference in these
scores with respect to metropolitan background status of the participants. It was
also noted from Levene’s test that there was no statistical evidence of unequal
variances as the p-values were all greater than 0.05 (the alpha or level of
significance) (Nordstokke, & Zumbo, 2010).

7
c. T-tests for mean of driver aggression, thrill seeking, and risk acceptance with
respect to study mode.
Table 7: Descriptive statistics of response variables in accordance to study mode
a. driver_agg = Driver aggression score, risk_accep = risk acceptance behaviour score, thrill = thrill
seeking behaviour score
b. N = Number of observations or samples for study mode
c. Mean = average aggression scores for study mode
d. Std. Deviation = Standard Deviation of scores from mean
e. Std. Error Mean = standard deviation divided by square root of sample size (n) measuring standard
deviation of the sample mean.
Table 8: T-test statistics for assessing difference with respect to study mode
f. F = Levene’s test statistics for equality of variances for study mode
g. Sig. = Maximum probability of equal variances for study mode.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

8
h. t = t-test statistics for the difference between two means (FT – PT) =
difference in sample means−difference in population means
pooled sample s . d / √ sample size
i. Sig (2-tailed) = Maximum probability of equality of two means.
j. Mean Difference = difference between means of aggression between two study modes.
k. Std. Error Difference = standard error or sample mean of the differences between two study modes.
l. 95% confidence interval = 95% probability that the difference between two means of study modes
will lie between the lower and upper limits.
Table 7 indicated that average of the RTA factors, such as aggression, thrill, and
risk acceptance were almost identical across the two study modes. Table 8 indicated
that there is no statistical evidence to state any difference in aggression and thrill
scores with respect to two study modes of the participants. But, PT students were
significantly more risk taking or accepting compared to FT students.
d. T-tests for mean of driver aggression, thrill seeking, and risk acceptance
with respect to RTA in past 12 months.
Table 9: Descriptive statistics of response variables in accordance to RTA in past 12 months
a. driver_agg = Driver aggression score, risk_accep = risk acceptance behaviour score, thrill = thrill
seeking behaviour score
b. N = Number of observations or samples for RTA in past 12 months
c. Mean = average aggression scores for RTA in past 12 months
d. Std. Deviation = Standard Deviation of scores from mean
e. Std. Error Mean = standard deviation divided by square root of sample size (n) measuring standard
deviation of the sample mean.

9
Table 10: T-test statistics for assessing difference with respect to RTA in past 12 months
f. F = Levene’s test statistics for equality of variances for RTA in past 12 months
g. Sig. = Maximum probability of equal variances for RTA in past 12 months.
h. t = t-test statistics for the difference between two means (No RTA – One or more RTA) =
difference in sample means−difference in population means
pooled sample s . d / √ sample size
i. Sig (2-tailed) = Maximum probability of equality of two means.
j. Mean Difference = difference between means of aggression between two RTA in past 12 months.
k. Std. Error Difference = standard error or sample mean of the differences between two RTA in past
12 months.
l. 95% confidence interval = 95% probability that the difference between two means of RTA in past 12
months will lie between the lower and upper limits.
Table 9 indicated that average of the RTA factors, such as aggression, thrill, and
risk acceptance were markedly different across the two RTA statuses. Table 10
indicated that there is strong statistical evidence to state difference in these scores
with respect to RTA in past 12 months of the participants. Students with no RTA in
past 12 months were significantly more aggressive, thrill and risk taking compared
to students with one or more RTA in past 12 months.

10
4. A. Depression is a categorical or variable, and difference between the non-
depressed and depressed according to other nominal factors has been statistically
measured by cross-tabulation and chi-square statistics.
Table 11 presents the number of males and females who were either depressed or
not depressed. Table 12 presents the summary of inferential statistics indicating the
level of association of depression levels with gender. The null hypothesis assuming
no association between levels of depression on gender was tested with Pearson’s
Chi-Square test. Level of significance or asymptotic level of significance was
evaluated as p = 0.597 > 0.05, which is greater than the level of significance of the
test. Hence, there was not enough evidence to reject the null hypothesis. So, it was
concluded that there was no statistically significant difference between depression
on males and females.
Table 11: Cross tabulation of depression levels and gender
a. Depression = student reported depression in first year has two states, either depressed or not
depressed.
b. Gender = Male and female are two genders under consideration

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

11
Table 12: Degree of association of depression with gender
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 1070.00.
b. Computed only for a 2x2 table
c. df = degrees of freedom for the test = (rows – 1)*(columns – 1) = (2-1)*(2-1) = 1
B. Table 13 presents the number of metropolitan or non-metropolitan students
who were either depressed or not depressed. Table 14 presents the summary of
inferential statistics indicating the level of association of depression levels with
metropolitan status. The null hypothesis assuming no association between levels of
depression on metropolitan status of students was tested with Pearson’s Chi-Square
test. Level of significance or asymptotic level of significance was evaluated as p = 0.736
> 0.05, which is greater than the level of significance of the test. Hence, there was not
enough evidence to reject the null hypothesis. So, it was concluded that there was no
statistically significant difference between depression levels on metropolitan status of
students before enrolment.

12
Table 13: Cross tabulation of depression levels and metropolitan status prior admission
a. Depression = student reported depression in first year has two states, either depressed or not
depressed.
b. Metro = metropolitan status of students prior admission(either stayed in metropolitan or not)
Table 14: Degree of association of depression with metropolitan status prior admission
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 1070.00.
b. Computed only for a 2x2 table
C. df = degrees of freedom for the test = (rows – 1)*(columns – 1) = (2-1)*(2-1) = 1
C. Table 15 presents the number of students who were either depressed or not
depressed with respect to part or full time study modes. Table 16 presents the summary
of inferential statistics indicating the level of association of depression levels with study
modes. The null hypothesis assuming no association between levels of depression on
study modes of students was tested with Pearson’s Chi-Square test. Level of significance

13
or asymptotic level of significance was evaluated as p = 0.080 > 0.05, which is greater
than the level of significance of the test. Hence, there was not enough evidence to reject
the null hypothesis. So, it was concluded that there was no statistically significant
difference between depression levels based on study modes of students, at 5% level of
significance. But, at 10% level of significance, it was possible to conclude that
depression of students depended on study modes (p = 0.80 < 0.01).
Table 15: Cross tabulation of depression levels and study modes
a. Depression = student reported depression in first year has two states, either depressed or not
depressed.
b. Study mode = full time or part time study mode of students
Table 16: Degree of association of depression with study modes of students
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 1070.00.
b. Computed only for a 2x2 table
C. df = degrees of freedom for the test = (rows – 1)*(columns – 1) = (2-1)*(2-1) = 1

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

14
D. Table 17 presents the number of students who were either depressed or not
depressed with respect to domestic or international students. Table 18 presents the
summary of inferential statistics indicating the level of association of depression levels
with fee status. The null hypothesis assuming no association between levels of
depression on fee status of students was tested with Pearson’s Chi-Square test. Level of
significance or asymptotic level of significance was evaluated as p = 0.956 > 0.05, which
is greater than the level of significance of the test. Hence, there was not enough evidence
to reject the null hypothesis. So, it was concluded that there was no statistically
significant difference between depression levels based on fee status of students, at 5%
level of significance.
Table 17: Cross tabulation of depression levels and fee status of students
a. Depression = student reported depression in first year has two states, either depressed or not
depressed.
b. Fee_Status = Domestic or international students with separate fee status

15
Table 18: Degree of association of depression with fee status of students
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 1070.00.
b. Computed only for a 2x2 table
C. df = degrees of freedom for the test = (rows – 1)*(columns – 1) = (2-1)*(2-1) = 1
5.
6. A. Binary logistic regression with “RTA_one_crash” as the dependent variable was
performed, and the model Predicted Probability of Membership for One RTA or more
(Knol, Le Cessie, Algra, Vandenbroucke, & Groenwold, 2012).
Table 19: Odds ratios, significance values, and confidence intervals for each predictor variable
a. Variable(s) entered on step 1: Age_Cat, GENDER, LIVING_ARRANGE, FEE_STATUS, dist_driving,
driver_agg, thrill, risk_accep.
b. The last category of every categorical variable was considered as the reference category.
c. Odd ratios or exponential of betas have been provided with 95% confidence intervals

16
5. B. Living arrangement of students during first year of study was noted to be a
statistically insignificant predictor of one or more accidents to the students. Base
variable was the independently staying in living arrangement. Odds ratio of one more
accidents while staying at home or at college room compared to staying independently
was 0.91 and 0.965, indicating that staying at home or at college room predicted less
occurrence of one or more accidents compared to staying independently, but, the ratios
were statistically insignificant to establish this claim (Sperandei, 2014).
5. C. Age was noted to be significant influencing predictor for one or more accidents.
Base category was “students aged 26 or more at time of enrolment”. Odds ratios for
other three age categories were greater than one. Hence, it can be concluded that
probability of one or more accidents increased by almost 3 times for students aged 18
years at time of enrolment. Interestingly, odds ratio for students aged 22 to 25 at time of
enrolment was not statistically significant, indicating that there was not much difference
between the occurrence of one or more accidents for students aged between 22-25 and
above 26 years (p = 0.698). Males were almost twice as probable to meet one or more
crashes compared to that of the females. Domestic students were significantly less
probable compared to international students regarding the crash frequency (Odds ratio
= 0.584, p < 0.05). Driving distance per week was no a statistically significant predictor,
and no significant difference between the two levels of driving distance was noted for
predicting one or more crash (p = 0.26).
Aggression, thrill, and risk acceptance have almost 2:1 ratio of meeting one or more
crashes. Especially, driver aggression has odds of almost 2:1 of meeting one or more
crashes. Hence, increase in driver aggression would double the chances of accidents.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

17
6. A. Binary logistic regression with “OB” as the dependent variable was performed, and
the model Predicted Probability is of Membership for Obese (Cokluk, 2010).
Table 20: Odds ratios, significance values, and confidence intervals for each predictor variable
a. Variable(s) entered on step 1: Age_Cat, GENDER, LIVING_ARRANGE, BL_owob, depression, edu_par,
owob_par.
b. The last category of every categorical variable was considered as the reference category.
c. Odd ratios or exponential of betas have been provided with 95% confidence intervals
6. B. Living arrangement of students during first year of study was noted to be a
statistically significant predictor of obesity. Base variable was the independently staying
in living arrangement. Odds ratio of obesity while staying at home or at college room
compared to staying independently was 0.881 and 0.1.078, indicating that staying at
home predicted less occurrence of obesity compared to staying independently, but,
higher odds of obesity for staying at college room. Interestingly, staying at home
compared to staying independently was a significant predictor, whereas, no statistically
significant difference was noted between students staying in college room or
independently for predicting obesity.

18
6. C. Age was noted to be significant influencing predictor for obesity. Base category
was “students aged 26 or more at time of enrolment”. Odds ratios for other three age
categories were greater than one. Hence, it can be concluded that odds in favour of
obesity increased by almost 2 times for students aged 18 years at time of enrolment.
Interestingly, odds ratio for students aged 22 to 25 at time of enrolment was not
statistically significant, indicating that there was not much difference between the
obesity for students aged between 22-25 and above 26 years (p = 0.238). Males were
almost 1.5 times as probable to be obese compared to that of the females. Normal or
underweight students were almost equal-probable compared to overweight students for
obesity, though, this comparison was statistically insignificant (p = 0.311). Not
depressed students were noted to be less probable to the obese compared to depressed
students (Odds ratio = 0.068, p < 0.05). Students with no obese parents were less
probable to be obese (Szumilas, 2010). Number of parents with university education was
a statistically significant predictor (p < 0.05), and noted to decrease obesity by 2.572 or
with an odds ratio of 1.31 (1/0.076) in favour of decreasing likelihood of later obesity.

19
References
Cokluk, O. (2010). Logistic Regression: Concept and Application. Educational Sciences:
Theory and Practice, 10(3), 1397-1407.
Kim, T. K. (2015). T test as a parametric statistic. Korean journal of anesthesiology, 68(6),
540.
Knol, M. J., Le Cessie, S., Algra, A., Vandenbroucke, J. P., & Groenwold, R. H. (2012).
Overestimation of risk ratios by odds ratios in trials and cohort studies:
alternatives to logistic regression. Cmaj, 184(8), 895-899.
Nordstokke, D. W., & Zumbo, B. D. (2010). A new nonparametric Levene test for equal
variances. Psicologica: International Journal of Methodology and Experimental
Psychology, 31(2), 401-430.
Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child
and adolescent psychiatry, 19(3), 227.
Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia medica:
Biochemia medica, 24(1), 12-18.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

20
Appendix
Logistic Regression Model for Question 5

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

24
Step number: 1
Observed Groups and Predicted Probabilities
20000 +
+
I
I
IN
I
F IN
I
R 15000 +N
+
E IN
I
Q IN
I
U IN
I
E 10000 +N
+
N IN
I
C IN
I
Y IN
I
5000 +N
+
IN
I
INN
I
INNNN N
I
Predicted ---------+---------+---------+---------+---------+---------
+---------+---------+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6
.7 .8 .9 1
Group:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOO
Predicted Probability is of Membership for One RTA or more
The Cut Value is .50
Symbols: N - No RTAs
O - One RTA or more
Each Symbol Represents 1250 Cases.

25
Logistic Regression Model for Question 6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

29
Step number: 1
Observed Groups and Predicted Probabilities
16000 +
+
I
I
I
I
F I
I
R 12000 +N
+
E IN
I
Q IN
I
U IN
I
E 8000 +N
+
N IN
I
C IN
I
Y IN
I
4000 +N
+
IN N
I
INNN O O
I
INNNN NN NNNNN NN NOO
I
Predicted ---------+---------+---------+---------+---------+---------
+---------+---------+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6
.7 .8 .9 1
Group:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOO
Predicted Probability is of Membership for Obese
The Cut Value is .50
Symbols: N - Not obese
O - Obese
Each Symbol Represents 1000 Cases.