# STAT 193: Statistics in Practice Project Assignment

VerifiedAdded on 2023/06/04

|8

|1723

|272

AI Summary

This project assignment for STAT 193 Statistics in Practice covers topics such as dataset, weekly income, marital status and ethnicity, highest qualification and income. It includes insights on hypothesis testing, ANOVA test, and more. The assignment also mentions the subject, course code, and college/university.

## Contribute Materials

Your contribution can guide someone’s learning journey. Share your
documents today.

STAT 193 Statistics in Practice

Project Assignment

Question 1: The dataset

o Gender is a nominal, categorical variable. Because there is no natural order, only

categorized in Male or Female.

o Age is a continuous, numerical variable. The minimum value is 15 years and the

maximum value is 45 years.

o Ethnicity is a nominal, categorical variable. The possible values the variable takes are

European, Pacific, Maori, or Other.

o Marital is a nominal, categorical variable. The possible values the variable takes are

Married, Never, Previously or Other.

o Qualification is a nominal, categorical variable. The possible values the variable takes

are Degree, School, Vocational or None.

o PostSchool is a nominal, categorical variable. Because there is no natural order, only

categorized in Yes or No.

o Hours is a continuous, numerical variable. The minimum value is 2 and the maximum

value is 70 years.

o Income is a discrete, numerical variable. The minimum value is 11 and the maximum

value is 1789.

Question 2: Weekly Income

(a) Histogram:

1

Project Assignment

Question 1: The dataset

o Gender is a nominal, categorical variable. Because there is no natural order, only

categorized in Male or Female.

o Age is a continuous, numerical variable. The minimum value is 15 years and the

maximum value is 45 years.

o Ethnicity is a nominal, categorical variable. The possible values the variable takes are

European, Pacific, Maori, or Other.

o Marital is a nominal, categorical variable. The possible values the variable takes are

Married, Never, Previously or Other.

o Qualification is a nominal, categorical variable. The possible values the variable takes

are Degree, School, Vocational or None.

o PostSchool is a nominal, categorical variable. Because there is no natural order, only

categorized in Yes or No.

o Hours is a continuous, numerical variable. The minimum value is 2 and the maximum

value is 70 years.

o Income is a discrete, numerical variable. The minimum value is 11 and the maximum

value is 1789.

Question 2: Weekly Income

(a) Histogram:

1

## Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The peak of this histogram veers to the left, hence the histogram’s tail has a positive skew to

the right. Therefore, the resulting distribution of weekly income is right-skewed. The vast

majority of New Zealanders aged 15-45 earn low weekly income, with very few earning high

weekly income. There seem to be probable outliers to the far right of the distribution.

(b) The point estimate of the mean weekly income of the population of New Zealanders aged

15-45 is given by the sample mean of weekly income which is given by: = 547.04. the

value was obtained from Excel using the function AVERAGE().

The sample standard deviation, s = 337.57

The interval estimate of the mean weekly income of population of New Zealanders aged

15-45 is given by the confidence interval, at 99% confidence level, is given by: μ ±

61.48= [485.59, 608.52].

P (485.59 < μ < 608.52) = 99%

2

the right. Therefore, the resulting distribution of weekly income is right-skewed. The vast

majority of New Zealanders aged 15-45 earn low weekly income, with very few earning high

weekly income. There seem to be probable outliers to the far right of the distribution.

(b) The point estimate of the mean weekly income of the population of New Zealanders aged

15-45 is given by the sample mean of weekly income which is given by: = 547.04. the

value was obtained from Excel using the function AVERAGE().

The sample standard deviation, s = 337.57

The interval estimate of the mean weekly income of population of New Zealanders aged

15-45 is given by the confidence interval, at 99% confidence level, is given by: μ ±

61.48= [485.59, 608.52].

P (485.59 < μ < 608.52) = 99%

2

Since, the sample size is greater than 30, the distribution of the random variable was

approximated with a normal distribution. In Excel, we used the function,

CONFIDENCE.NORM(alpha, standard deviation, sample size)

(c) The general distribution of the sample mean is normally distributed. that is,

with mean, μ and variance, σ2.

The sample in this case is said to be normally distributed because the sample size is large

(n > 30). Moreover, normal distribution is used to approximate many natural phenomena

so well. In a nut shell, the sample mean is calculated from an independent, identically

distributed random variable with a finite variance. Accordingly, based on the central limit

theorem, the sample mean has a normal distribution regardless of the distribution of the

population.

(d) Hypothesis testing

Let μ = to the average weekly income of New Zealand

We formulate the hypothesis test as:

Ho: μ = $986

Ha: μ ≠ $986

This is a two-tailed test. We the z-test to calculate the test statistics.

The significance level, α = 0.05. Hence, the critical values, = ± 1.96

{“=NORM.S.INV(0.025)”}

We calculate the test statistics as:

3

approximated with a normal distribution. In Excel, we used the function,

CONFIDENCE.NORM(alpha, standard deviation, sample size)

(c) The general distribution of the sample mean is normally distributed. that is,

with mean, μ and variance, σ2.

The sample in this case is said to be normally distributed because the sample size is large

(n > 30). Moreover, normal distribution is used to approximate many natural phenomena

so well. In a nut shell, the sample mean is calculated from an independent, identically

distributed random variable with a finite variance. Accordingly, based on the central limit

theorem, the sample mean has a normal distribution regardless of the distribution of the

population.

(d) Hypothesis testing

Let μ = to the average weekly income of New Zealand

We formulate the hypothesis test as:

Ho: μ = $986

Ha: μ ≠ $986

This is a two-tailed test. We the z-test to calculate the test statistics.

The significance level, α = 0.05. Hence, the critical values, = ± 1.96

{“=NORM.S.INV(0.025)”}

We calculate the test statistics as:

3

= -18.39

p-value = “2*(1-NORM.S.DIST(-18.39,TRUE))”

The test statistic = -18.39 < -1.96. Moreover, the p-value = 2.00 > 0.005. Hence, we

reject Ho.

There is sufficient evidence, at 95% confidence level, to support that the mean weekly

income of the New Zealand population aged 15-45 differed from that of the Australian

population.

(e) Given = $667.00, is given as:

= 0.4483 “=1- NORM.S.DIST(0.13,TRUE)”

(f) XNZ = $850.00; =$677.00; = $237.50

= 0.7284

XAUS = $950.00; =$986.00; = $245.70

Based on the standardized scores, the New Zealander would have a higher relative

standing in their relative population compare to the Australian.

4

p-value = “2*(1-NORM.S.DIST(-18.39,TRUE))”

The test statistic = -18.39 < -1.96. Moreover, the p-value = 2.00 > 0.005. Hence, we

reject Ho.

There is sufficient evidence, at 95% confidence level, to support that the mean weekly

income of the New Zealand population aged 15-45 differed from that of the Australian

population.

(e) Given = $667.00, is given as:

= 0.4483 “=1- NORM.S.DIST(0.13,TRUE)”

(f) XNZ = $850.00; =$677.00; = $237.50

= 0.7284

XAUS = $950.00; =$986.00; = $245.70

Based on the standardized scores, the New Zealander would have a higher relative

standing in their relative population compare to the Australian.

4

## Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Question 3: Marital Status and ethnicity

(a) The graph suggests that there is an association between Marital and Ethnicity. We can

see from the graph that the number of individuals with a certain marital status vary with

their ethnicity. For instance, there are more Pacific citizens are married than Maori.

Similarly, less Pacific citizens have never married compared to the Maori.

(b) Null Hypothesis, Ho: Marital status is not associated with Ethnicity

Alternative hypothesis, Ha: Marital status is associated with Ethnicity.

(c) The number of Pacific people who are married = 0

Total married = 64

(d) i. Percentage of people who have never married that are Maori = 16/90 = 17.78%

ii. Percentage of Maori people who have ever married = 16/21 = 76.19%

(e) Test statistic value = χ2 = 21.69

Degrees of freedom = 9

p-value = 0.0099 = 0.01

At level of significance, α = 0.005, the p-value is < 0.05, the null hypothesis is rejected.

There is sufficient evidence to support the conclusion that marital status is associated

with ethnicity. In other words, the marital status of people changes with their ethnicity.

5

(a) The graph suggests that there is an association between Marital and Ethnicity. We can

see from the graph that the number of individuals with a certain marital status vary with

their ethnicity. For instance, there are more Pacific citizens are married than Maori.

Similarly, less Pacific citizens have never married compared to the Maori.

(b) Null Hypothesis, Ho: Marital status is not associated with Ethnicity

Alternative hypothesis, Ha: Marital status is associated with Ethnicity.

(c) The number of Pacific people who are married = 0

Total married = 64

(d) i. Percentage of people who have never married that are Maori = 16/90 = 17.78%

ii. Percentage of Maori people who have ever married = 16/21 = 76.19%

(e) Test statistic value = χ2 = 21.69

Degrees of freedom = 9

p-value = 0.0099 = 0.01

At level of significance, α = 0.005, the p-value is < 0.05, the null hypothesis is rejected.

There is sufficient evidence to support the conclusion that marital status is associated

with ethnicity. In other words, the marital status of people changes with their ethnicity.

5

(f) The conclusion ins part (e) was not surprising given it was similar with my answer in part

(a). moreover, examine the pattern of number in the chart, it was noted that more Maori

people were never married than pacific people. The number of the other marital status

differed among the ethnicity groups. Therefore, as the conclusion indicated, the variable

Marital changes with the variable Ethnicity.

Question 4: Highest Qualification and Income

(a) The side-by-side boxplots of Income by Qualification suggest that the mean weekly

income differs among qualification levels. The location of the boxplot for each

qualification level is different, which indicates that the median of weekly income is

different for each qualification level. For instance, the median weekly income for people

with a degree is higher than the weekly income for people without any qualification.

Similarly, the median weekly income for “None” qualification is higher than the median

weekly income of people with “school” qualification.

(b) Null hypothesis, Ho: μ1 = μ2 = μ3 = μ4 (the mean weekly income is equal for all

qualification levels)

Alternative hypothesis, Ha: μ1 ≠ μ2 ≠ μ3 ≠ μ4 (the mean weekly income is not equal for all

qualification level, at least one is different)

(c) The results of the ANOVA test are such that:

The test statistic = F = 42.051

Degrees of freedom = 3, and 196; total = 199

6

(a). moreover, examine the pattern of number in the chart, it was noted that more Maori

people were never married than pacific people. The number of the other marital status

differed among the ethnicity groups. Therefore, as the conclusion indicated, the variable

Marital changes with the variable Ethnicity.

Question 4: Highest Qualification and Income

(a) The side-by-side boxplots of Income by Qualification suggest that the mean weekly

income differs among qualification levels. The location of the boxplot for each

qualification level is different, which indicates that the median of weekly income is

different for each qualification level. For instance, the median weekly income for people

with a degree is higher than the weekly income for people without any qualification.

Similarly, the median weekly income for “None” qualification is higher than the median

weekly income of people with “school” qualification.

(b) Null hypothesis, Ho: μ1 = μ2 = μ3 = μ4 (the mean weekly income is equal for all

qualification levels)

Alternative hypothesis, Ha: μ1 ≠ μ2 ≠ μ3 ≠ μ4 (the mean weekly income is not equal for all

qualification level, at least one is different)

(c) The results of the ANOVA test are such that:

The test statistic = F = 42.051

Degrees of freedom = 3, and 196; total = 199

6

p-value = 0.0000002 < 0.001

At 5% level of significance, we reject the null hypothesis, Ho, given that the p-value ≈

0.0000 < 0.05. There is sufficient statistical evidence to conclude that the mean weekly income is

not equal for all qualification levels.

(d) Calculating the degree of freedom was done as follows:

Qualification levels = 4 = m => df (Factor) = m – 1 = 4 -1 = 3

Sample size = n = 200 => df (Error) = n – m = 200 – 4 = 196

df (Total) = n – 1 = 200 – 1 = 199 = (m - 1) + (n - m) = 3 + 196 = 199

(e) The results indicate that there is sufficient evidence of a difference in mean income at 5%

significance level for the following pairs:

- Degree & None

- Degree & School

- None & Vocational

This is because the p-values for the difference in mean income between qualification

groups was reported to be 0.0000 which is less than the level of significance (< 0.05).

Therefore, we conclude that there is significant difference in mean income between these

pairs. On the other hand, the p-value for the pair, None & School was obtained as 0.85 >

0.05, we reject Ho, and conclude that there is no significant difference in mean income

between this pair.

(f) For the ANOVA test to be valid, the following assumptions must be met:

7

At 5% level of significance, we reject the null hypothesis, Ho, given that the p-value ≈

0.0000 < 0.05. There is sufficient statistical evidence to conclude that the mean weekly income is

not equal for all qualification levels.

(d) Calculating the degree of freedom was done as follows:

Qualification levels = 4 = m => df (Factor) = m – 1 = 4 -1 = 3

Sample size = n = 200 => df (Error) = n – m = 200 – 4 = 196

df (Total) = n – 1 = 200 – 1 = 199 = (m - 1) + (n - m) = 3 + 196 = 199

(e) The results indicate that there is sufficient evidence of a difference in mean income at 5%

significance level for the following pairs:

- Degree & None

- Degree & School

- None & Vocational

This is because the p-values for the difference in mean income between qualification

groups was reported to be 0.0000 which is less than the level of significance (< 0.05).

Therefore, we conclude that there is significant difference in mean income between these

pairs. On the other hand, the p-value for the pair, None & School was obtained as 0.85 >

0.05, we reject Ho, and conclude that there is no significant difference in mean income

between this pair.

(f) For the ANOVA test to be valid, the following assumptions must be met:

7

## Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

- The population from which each group is drawn is normally distributed

- There is a common variance for all populations

- The samples are independently drawn from each other, and the observations

from this samples are randomly selected

(g) The residual plot is pretty symmetrically distributed with most residual points tending to

cluster toward the middle of the plot. the plot does not indicate any sign of

heteroscedasticity, nonlinearity, skewness, or an outlier. Therefore, we can correctly

assume that the variable is normally distributed and a linear model provides a decent fit

for the data.

0 200 400 600 800 1000 1200 1400 1600 1800 2000

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Income Residual Plot

Income

Residuals

(h) A residual is the difference between the observed value of the dependent variable and the

predicted value of the same variable.

That is, Residuals = Observed values – Predicted values

8

- There is a common variance for all populations

- The samples are independently drawn from each other, and the observations

from this samples are randomly selected

(g) The residual plot is pretty symmetrically distributed with most residual points tending to

cluster toward the middle of the plot. the plot does not indicate any sign of

heteroscedasticity, nonlinearity, skewness, or an outlier. Therefore, we can correctly

assume that the variable is normally distributed and a linear model provides a decent fit

for the data.

0 200 400 600 800 1000 1200 1400 1600 1800 2000

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Income Residual Plot

Income

Residuals

(h) A residual is the difference between the observed value of the dependent variable and the

predicted value of the same variable.

That is, Residuals = Observed values – Predicted values

8

1 out of 8

### Related Documents

Your All-in-One AI-Powered Toolkit for Academic Success.

##### +13062052269

##### info@desklib.com

Available 24*7 on WhatsApp / Email

Unlock your academic potential

© 2024 | Zucol Services PVT LTD | All rights reserved.