Statistical Modelling Assignment PDF
VerifiedAdded on 2021/05/30
|12
|2377
|37
AI Summary
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
STATISTICAL MODELLING
STUDENT ID:
[Pick the date]
STUDENT ID:
[Pick the date]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Section 1: Introduction
a) Even though the representation of females in the total workforce has enhanced but one key
concern still remains i.e. the salary level difference that is visible for the two genders. This
is referred to as gender gap and is supported by various evidences in the Australian context
such as the recent report from WGEA (Workplace Gender Equality Agency) which
highlights the salary levels of females as being 15% lower in comparison to males.
Besides, a concerning aspect is that this gender gap is not only found in those occupations
that are dominated by males but exists even amongst the professions where female are in
majority. This gender gap exists despite the government bringing in various laws to
prevent such a gap (Livsey, 2017). In wake of this background, further research is required
on the provided dataset so that it could be analysed if gender gap does exist in Australia.
b) The dataset one has data for 1000 taxpayers and covers various aspects such as gender,
salary levels, underlying occupation and gift related deductions. The dataset 1 cannot be
categorised as primary data since it has not been collected by the university but has been
collected by the ATO and from there the university has obtained the data (Flick, 2015).
The gender variable is categorical since it can assume two labels i.e. male and female. The
occupation is captured using various codes but these have no particular arrangement
possible. The data for the salary/wages is in the form of quantitative variable which is
captured through the use of ratio scale. The gift related deduction would also be
considered as a quantitative variable having a ratio scale measurement (Hair et. al., 2015).
The dataset 1 is unique and the initial five cases are presented as follows.
c) The collection of dataset 2 has been completed through the use of convenience sampling
in which I contacted people whom I knew and ensured that there is fair representation of
both genders so that a comparison of salary levels could be made. Even though this dataset
is primary owing to self-collection but still it has some potential shortcoming which would
impact the reliability of the results obtained (Hillier, 2016).. The first drawback is that the
a) Even though the representation of females in the total workforce has enhanced but one key
concern still remains i.e. the salary level difference that is visible for the two genders. This
is referred to as gender gap and is supported by various evidences in the Australian context
such as the recent report from WGEA (Workplace Gender Equality Agency) which
highlights the salary levels of females as being 15% lower in comparison to males.
Besides, a concerning aspect is that this gender gap is not only found in those occupations
that are dominated by males but exists even amongst the professions where female are in
majority. This gender gap exists despite the government bringing in various laws to
prevent such a gap (Livsey, 2017). In wake of this background, further research is required
on the provided dataset so that it could be analysed if gender gap does exist in Australia.
b) The dataset one has data for 1000 taxpayers and covers various aspects such as gender,
salary levels, underlying occupation and gift related deductions. The dataset 1 cannot be
categorised as primary data since it has not been collected by the university but has been
collected by the ATO and from there the university has obtained the data (Flick, 2015).
The gender variable is categorical since it can assume two labels i.e. male and female. The
occupation is captured using various codes but these have no particular arrangement
possible. The data for the salary/wages is in the form of quantitative variable which is
captured through the use of ratio scale. The gift related deduction would also be
considered as a quantitative variable having a ratio scale measurement (Hair et. al., 2015).
The dataset 1 is unique and the initial five cases are presented as follows.
c) The collection of dataset 2 has been completed through the use of convenience sampling
in which I contacted people whom I knew and ensured that there is fair representation of
both genders so that a comparison of salary levels could be made. Even though this dataset
is primary owing to self-collection but still it has some potential shortcoming which would
impact the reliability of the results obtained (Hillier, 2016).. The first drawback is that the
given sample can potentially be biased since random sampling has not been deployed.
Also, some of the data may be wrong particularly the salary as people may have overstated
considering that they were aware that this was being given to me and since I knew them
personally. In this primary dataset, there are only two variables which are of interest for
the given research question. One of these is the gender represented through the use of male
and female labels. The variable is categorical. The other variable i.e. salary is quantitative
owing to representation by numerical values (Eriksson and Kovalainen, 2015). The
sample size of dataset 2 is taken as 34.
Section 2: Descriptive Statistics (Using Dataset 1)
a) The requisite representation in graphical terms for the relationship between occupation and
gender is summarised as follows.
The key observation which can be drawn from the graph indicated above is that the
representation of the two genders is not the same or similar across professions and there is a
significant difference in this regards. A case in point is of code 7 which represents drivers and
machine operators. The proportion of female workers in this occupation is abysmally low.
Then there are certain occupations represented with code 4 and 5 which have a
disproportionate representation from females since one of the occupations is community
services while the other comprises of clerical and administrative workers. Community
Also, some of the data may be wrong particularly the salary as people may have overstated
considering that they were aware that this was being given to me and since I knew them
personally. In this primary dataset, there are only two variables which are of interest for
the given research question. One of these is the gender represented through the use of male
and female labels. The variable is categorical. The other variable i.e. salary is quantitative
owing to representation by numerical values (Eriksson and Kovalainen, 2015). The
sample size of dataset 2 is taken as 34.
Section 2: Descriptive Statistics (Using Dataset 1)
a) The requisite representation in graphical terms for the relationship between occupation and
gender is summarised as follows.
The key observation which can be drawn from the graph indicated above is that the
representation of the two genders is not the same or similar across professions and there is a
significant difference in this regards. A case in point is of code 7 which represents drivers and
machine operators. The proportion of female workers in this occupation is abysmally low.
Then there are certain occupations represented with code 4 and 5 which have a
disproportionate representation from females since one of the occupations is community
services while the other comprises of clerical and administrative workers. Community
workers require high degree of empathy and hence it is dominated by females. Similarly,
administrative jobs are jobs which do not require any travel and are limited to desk only,
thereby these have a preferences amongst females.
b) The relationship between salary/wage and gender is graphically captured as indicated
below.
The gender gap is adequately captured by the bar chart graph as shown above. For the two
salary levels i.e. 0-$25000 and $ 25,000-$50,000, the proportion of females is higher than
their male counterparts, thereby highlighting a disproportionate amount of women tend to
have a low salary. For salary levels exceeding $ 50,000 per year, males tend to have a higher
representation when compared with females. Further, the female proportion has a negative
relation with salary levels since as higher salary brackets are considered, the proportion of
females continues to decline. This is quite a disconcerting observation. A potential defence
for the above observation could be that the proportion of men in high paying jobs tends to be
higher resulting in higher average salaries. However, an issue with this argument is that the
various studies highlight that gender gap also exists in occupations where female
representation is higher.
c) The requisite summary between salary levels and gender in numerical terms is indicated as
follows.
administrative jobs are jobs which do not require any travel and are limited to desk only,
thereby these have a preferences amongst females.
b) The relationship between salary/wage and gender is graphically captured as indicated
below.
The gender gap is adequately captured by the bar chart graph as shown above. For the two
salary levels i.e. 0-$25000 and $ 25,000-$50,000, the proportion of females is higher than
their male counterparts, thereby highlighting a disproportionate amount of women tend to
have a low salary. For salary levels exceeding $ 50,000 per year, males tend to have a higher
representation when compared with females. Further, the female proportion has a negative
relation with salary levels since as higher salary brackets are considered, the proportion of
females continues to decline. This is quite a disconcerting observation. A potential defence
for the above observation could be that the proportion of men in high paying jobs tends to be
higher resulting in higher average salaries. However, an issue with this argument is that the
various studies highlight that gender gap also exists in occupations where female
representation is higher.
c) The requisite summary between salary levels and gender in numerical terms is indicated as
follows.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The shocking revelation from the above numerical summary is the fact that almost 50% of
the females included in the sample tend to have an annual salary level lower than $ 25,000.
The representation of females in a given income group seems inversely related to the income
level. This conclusion is drawn as the proportion of females continues keeps falling as the
salary levels bracket tend to rise. This reflects presence of glass ceiling so that the inferior
job choices are pursued by females while the higher paying and more authoritative jobs are
manned by male counterparts. To an extent the above pattern can be explained by the
occupational distribution between the two genders, however, it would be imprudent to
attribute the complete salary gap between the two genders to only occupational distribution
patterns.
d) The underlying relationship between gift amount deduction and income levels is
highlighted using the scatter diagram below.
the females included in the sample tend to have an annual salary level lower than $ 25,000.
The representation of females in a given income group seems inversely related to the income
level. This conclusion is drawn as the proportion of females continues keeps falling as the
salary levels bracket tend to rise. This reflects presence of glass ceiling so that the inferior
job choices are pursued by females while the higher paying and more authoritative jobs are
manned by male counterparts. To an extent the above pattern can be explained by the
occupational distribution between the two genders, however, it would be imprudent to
attribute the complete salary gap between the two genders to only occupational distribution
patterns.
d) The underlying relationship between gift amount deduction and income levels is
highlighted using the scatter diagram below.
The above scatterplot clearly reflects that the relationship between income and donation
deduction is not significant. Further, evidence in this regards is provided by R2 value which is
close to zero and hence indicative of the underlying insignificance of the association
relationship between income and donation deduction (Flick, 2015). But this result is quite
expected and should not lead to any surprises considering that donation would not have any
significant link with the income earned and is more driven by the personal factors like the
urge to donate and how closely a given individual is attached to a particular cause for which
donation may be done.
Section 3: Inferential Statistics
a) The salary levels corresponding to the different occupations have been contained in the
sample data. Using statistical techniques, the median salary associated with the given
occupational code has been calculated which indicates that the highest paying occupations
as per median salaries are 2,1,7,3 in decreasing order.
For estimating the gender proportions in the occupations identified above, 95% confidence
interval would be computed based on the participation levels of both the genders in the
sample data. The determination of required confidence intervals for specified occupations is
carried out as follows.
deduction is not significant. Further, evidence in this regards is provided by R2 value which is
close to zero and hence indicative of the underlying insignificance of the association
relationship between income and donation deduction (Flick, 2015). But this result is quite
expected and should not lead to any surprises considering that donation would not have any
significant link with the income earned and is more driven by the personal factors like the
urge to donate and how closely a given individual is attached to a particular cause for which
donation may be done.
Section 3: Inferential Statistics
a) The salary levels corresponding to the different occupations have been contained in the
sample data. Using statistical techniques, the median salary associated with the given
occupational code has been calculated which indicates that the highest paying occupations
as per median salaries are 2,1,7,3 in decreasing order.
For estimating the gender proportions in the occupations identified above, 95% confidence
interval would be computed based on the participation levels of both the genders in the
sample data. The determination of required confidence intervals for specified occupations is
carried out as follows.
The result obtained above suggests that it can be concluded with a likelihood of 95% that
female representation in occupation code 1 would range from 0.3123 and 0.5272 (Hillier,
2016).
The result obtained above suggests that it can be concluded with a likelihood of 95% that
female representation in occupation code 3 would range from 0.0539 and 0.1879 (Hair et. al.,
2016).
female representation in occupation code 1 would range from 0.3123 and 0.5272 (Hillier,
2016).
The result obtained above suggests that it can be concluded with a likelihood of 95% that
female representation in occupation code 3 would range from 0.0539 and 0.1879 (Hair et. al.,
2016).
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
The result obtained above suggests that it can be concluded with a likelihood of 95% that
female representation in occupation code 2 would range from 0.4425 and 0.5928 (Flick,
2015).
The result obtained above suggests that it can be concluded with a likelihood of 95% that
female representation in occupation code 7 would range from 0.0000 and 0.1258
(Eriksson,and Kovalainen, 2015).
The computation of confidence interval above highlights the dismal representation of females
in these occupations with high median salary. Fair representation of females is visible only in
two occupations while the other two are highly male centric.
b) The hypotheses considered for given hypothesis test are listed below.
Null Hypothesis (H0): p≤0.8 i.e. the representation of males in occupation code 7 is not
greater than 80%.
female representation in occupation code 2 would range from 0.4425 and 0.5928 (Flick,
2015).
The result obtained above suggests that it can be concluded with a likelihood of 95% that
female representation in occupation code 7 would range from 0.0000 and 0.1258
(Eriksson,and Kovalainen, 2015).
The computation of confidence interval above highlights the dismal representation of females
in these occupations with high median salary. Fair representation of females is visible only in
two occupations while the other two are highly male centric.
b) The hypotheses considered for given hypothesis test are listed below.
Null Hypothesis (H0): p≤0.8 i.e. the representation of males in occupation code 7 is not
greater than 80%.
Alternative Hypothesis (H1): p>0.8 i.e. the representation of males in occupation code 7 is
greater than 80%.
The relevant test statistics for the given hypothesis test is z and a right tail test would be
conducted. This has been completed in excel and the relevant output is pasted below.
The computations above clearly highlight that p value is 0.0057. For the given hypothesis
test, the significance level is 5%. Considering that the p value does not exceed the
significance level for this given hypothesis test, hence it would be fair to conclude that the
statistical evidence would lead to rejection of null hypothesis. As a result, alternative
hypothesis would be accepted (Hillier, 2016). Therefore, it would be appropriate to conclude
that male representation in machine operators and drivers tends to be higher than 80%.
c) The hypotheses considered for given hypothesis test are listed below.
Null Hypothesis (H0): μfemale = μmale which implies that the average salary levels do not tend to
significantly vary across the two genders.
Alternative Hypothesis (H1): μfemale ≠ μmale which implies that the average salary levels do tend
to significantly vary across the two genders
Considering that population standard deviation is not known in this case, hence the relevant
test statistics would be T (Flick, 2015). The hypothesis test has been completed using excel
and the output obtained is outlined as follows.
greater than 80%.
The relevant test statistics for the given hypothesis test is z and a right tail test would be
conducted. This has been completed in excel and the relevant output is pasted below.
The computations above clearly highlight that p value is 0.0057. For the given hypothesis
test, the significance level is 5%. Considering that the p value does not exceed the
significance level for this given hypothesis test, hence it would be fair to conclude that the
statistical evidence would lead to rejection of null hypothesis. As a result, alternative
hypothesis would be accepted (Hillier, 2016). Therefore, it would be appropriate to conclude
that male representation in machine operators and drivers tends to be higher than 80%.
c) The hypotheses considered for given hypothesis test are listed below.
Null Hypothesis (H0): μfemale = μmale which implies that the average salary levels do not tend to
significantly vary across the two genders.
Alternative Hypothesis (H1): μfemale ≠ μmale which implies that the average salary levels do tend
to significantly vary across the two genders
Considering that population standard deviation is not known in this case, hence the relevant
test statistics would be T (Flick, 2015). The hypothesis test has been completed using excel
and the output obtained is outlined as follows.
It is apparent from the alternative hypothesis that the test is two tailed and therefore the p
value that would be considered for performing hypothesis test is two tail p value which has
been derived as 0.00. For the given hypothesis test, the significance level is 5%. Considering
that the p value does not exceed the significance level for this given hypothesis test, hence it
would be fair to conclude that the statistical evidence would lead to rejection of null
hypothesis. As a result, alternative hypothesis would be accepted (Eriksson and
Kovalainen,.2015). Therefore, it would be appropriate to conclude that gender gap does exist
based on the given sample data.
d) The objective is to perform a hypothesis test on the basis of dataset 2 so as to check if
gender gap tends to exist or not.
Considering that population standard deviation is not known in this case, hence the relevant
test statistics would be T (Hair et. al., 2015). The hypothesis test has been completed using
excel and the output obtained is outlined as follows.
value that would be considered for performing hypothesis test is two tail p value which has
been derived as 0.00. For the given hypothesis test, the significance level is 5%. Considering
that the p value does not exceed the significance level for this given hypothesis test, hence it
would be fair to conclude that the statistical evidence would lead to rejection of null
hypothesis. As a result, alternative hypothesis would be accepted (Eriksson and
Kovalainen,.2015). Therefore, it would be appropriate to conclude that gender gap does exist
based on the given sample data.
d) The objective is to perform a hypothesis test on the basis of dataset 2 so as to check if
gender gap tends to exist or not.
Considering that population standard deviation is not known in this case, hence the relevant
test statistics would be T (Hair et. al., 2015). The hypothesis test has been completed using
excel and the output obtained is outlined as follows.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The relevant p value is 0.1864 which is clearly higher than the assumed significance level of
5%. Thus, the available evidence does not warrant rejection of null hypothesis. Hence, the
correct conclusion to be drawn is that the gender gap is not significant (Hastie, Tibshirani
and Friedman, 2011).
Section 4: Conclusion
a) The analysis conducted above clearly highlights the existence of a gender gap in terms of
differential salary for the two genders. Also, it is noted that males in general tend to have a
higher representation in those jobs which tend to have a higher median salary. Besides, the
relationship between gift/donation deduction and the salary level was not found to be
significant since it is dependent more on the individual concerned than the underlying
salary. Besides, the gender representations in the various occupations highlighted is found
to be significantly different for the two genders.
b) An area of further research would be to test for presence of gender gap in occupations
which are dominated by females as it would provide irrefutable evidence for gender gap
existence. This is paramount since even though this particular research study also hints at
presence of gender gap but it could be attributed to the difference in the occupational
representations for the two genders.
5%. Thus, the available evidence does not warrant rejection of null hypothesis. Hence, the
correct conclusion to be drawn is that the gender gap is not significant (Hastie, Tibshirani
and Friedman, 2011).
Section 4: Conclusion
a) The analysis conducted above clearly highlights the existence of a gender gap in terms of
differential salary for the two genders. Also, it is noted that males in general tend to have a
higher representation in those jobs which tend to have a higher median salary. Besides, the
relationship between gift/donation deduction and the salary level was not found to be
significant since it is dependent more on the individual concerned than the underlying
salary. Besides, the gender representations in the various occupations highlighted is found
to be significantly different for the two genders.
b) An area of further research would be to test for presence of gender gap in occupations
which are dominated by females as it would provide irrefutable evidence for gender gap
existence. This is paramount since even though this particular research study also hints at
presence of gender gap but it could be attributed to the difference in the occupational
representations for the two genders.
References
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research 3rd ed.
London: Sage Publications.
Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research
project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials
of business research methods. 2nd ed. New York: Routledge.
Hastie, T., Tibshirani, R. and Friedman, J. (2011) The Elements of Statistical Learning. 4th
ed. New York: Springer Publications.
Hillier, F. (2016) Introduction to Operations Research. 6th ed. New York: McGraw Hill
Publications.
Livsey, A (2017) Australia's gender pay gap: why do women still earn less than men?
[online] Available at
https://www.theguardian.com/australia-news/datablog/2017/oct/18/australia-gender-pay-gap-
why-do-women-still-earn-less-than-men [Assessed at May 12, 2018]
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research 3rd ed.
London: Sage Publications.
Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research
project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials
of business research methods. 2nd ed. New York: Routledge.
Hastie, T., Tibshirani, R. and Friedman, J. (2011) The Elements of Statistical Learning. 4th
ed. New York: Springer Publications.
Hillier, F. (2016) Introduction to Operations Research. 6th ed. New York: McGraw Hill
Publications.
Livsey, A (2017) Australia's gender pay gap: why do women still earn less than men?
[online] Available at
https://www.theguardian.com/australia-news/datablog/2017/oct/18/australia-gender-pay-gap-
why-do-women-still-earn-less-than-men [Assessed at May 12, 2018]
1 out of 12
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.