Tax Agent Preference Analysis

Verified

Added on  2020/05/16

|11
|3472
|61
AI Summary
This assignment analyzes two datasets to explore factors influencing people's preference for using a tax agent versus self-preparation. It examines associations between income, age, lodgment method, and total deductions. The analysis includes calculating proportions, confidence intervals, and correlation coefficients to understand the relationship between variables. The findings suggest an association between age group and lodgment method, as well as a weak positive correlation between total income and deduction amounts for both groups. The assignment highlights potential data issues in Dataset 1, suggesting the need for further data collection.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Section 1:
a.
The provided dataset is a sample of 2013 -2014 of Australian taxation office which indicates the
lodge tax return of lodgment method. The lodgment method is either a tax agent or self-
prepare, the people pay tax by using one of the lodgment method, either they select a tax agent
or self-prepare to pay tax for a financial year. The assignment is based on the analysis of
lodgment method for gender and different age groups. The article is based on the analysis of
information of individual tax return after the end of the financial year.
b.
There are two types of data which can be used for analysis, first is Primary data, which can
directly have collected from the customers of the organization on the basis of questionnaire.
And other is Secondary data, which can have collected from official website of the organization
and also from other official websites related to organization. (Goodwin, 2012, p.130). The
dataset 1 is a subset of Australian taxation office collected from official website of the ATO. The
data is collected by a government site not by the specific user, so it is a secondary data type. The
data can be categorized as qualitative or quantitative, the qualitative data can be further
categorized as binary level, nominal level or the ordinal level of measurements and the
quantitative data can be further categorized as interval or ratio level measurements. (Morgan,
2013, p.9). The variable gender has two categories (0=male and 1=female), so it is a nominal
level of variable, it indicates the gender of a tax payer.
The variable age_range contains integer values, so it is a quantitative variable. It indicates the
age of a tax payer.
The variable lodgment_method have two categories (A=Tax agent and B=Self prepare), so it is a
nominal level of variable, it indicates the lodgment category of a tax payer.
The variable tot_in_amt contains integer values, so it is a quantitative variable. It indicates the
total income amount of a tax payer in which a tax payer will pay tax.
The variable tot_ded_amt contains integer values, so it is a quantitative variable. It indicates the
total deduction amount of a tax payer. It is an amount of total deduction in the actual income
amount as a tax.
The first five cases of dataset 1 is sown below:
Gender age_rang
e
Lodgment_method Tot_inc_amt Tot_ded_amt
0 5 A 49612 8184
0 6 A 131313 7686
1 6 S 53320 1201
0 9 S 56748 95
1 6 A 84863 2016
c.
Statistical methods basically a process of collection, summarizing, analysis and the interpretation
of the analysis. Making questionnaire on the basis of the importance of factors of the study of
the organization. The characteristics of the study will contain, a specific plan, design structure to
get the answers from the respondents. The questionnaire will contain the questions related to
the open ended, closed ended, and the nominal, ordinal and interval level ratio variables. The
analysis of the collected data from the questionnaire will indicate the strength, weakness,
opportunities and threats of the factors of the study. The statistical data will indicate a
summary statistic of the analysis, which will contain the graphical representation of each factor,

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
numerical summary of each factor and the final principal components of the study. (Brace,
2008, p.45).
The procedure of the data collection and analysis can be derived as follows:
a. Topic selection: - Scale should not be wide are level of measurement should be accurate.
b. Determination of hypothesis: - it includes the objective of the study.
c. Sampling method: - Selecting an appropriate sampling method related to the study.
d. Data collection: - Data should be collected through direct interview or by other similar
companies’ data.
e. Data handling: - Coding and putting the responses in to level of measurements.
f. Statistical Analysis: -It includes the appropriate statistical model for the analysis.
i. Gathering of results: - It includes the graphical and numerical representation of the data.
j. Conclusions: - Determine the findings related to the hypothesis of the study. (Bethlehem,
2009, p.1). In the process of collection of dataset 2, I have collected the data for the 503
individuals of by using a survey. The sampling process involved to calculate the representative
sample size of the population. The sample size for this survey was considered as 503. Thus, a
large sample will be a representative of the population, it will indicate the unbiased results of
the study, also the method of data collection will unbiased. It is a primary data, which is directly
collected from the respondents on the basis of questionnaire.
The survey includes following variables:
1. Gender: It indicates the gender of an individual (0=male and 1=female). It is a nominal level of
variable.
2. Age: It indicates the age of respondent, which contains integer values.
3. Lodgment method: it indicates the preference of method as “A=Tax agent” or “B=Self
prepare”.
4. Total income amount: It indicates the he total income of a respondent in a financial year.
5. Total deduction amount: It indicates the total deduction amount of a tax payer from actual
income amount as a tax in a financial year.
Section 2:
a.
The variable lodgment method is a qualitative variable which have two categories as “A=Tax
agent” or “B=Self prepare”. So, a pie chart will be suitable for lodgment method, the
percentage of frequencies for each type of preference of lodgment method is shown below:
74%
26%
Agent
Self Prepared
Document Page
So, the number of people who hire tax agent to pay tax are 740 and the number of people who
pays tax by self-preparation are 260.
b.
The sample size is 1000 and 740 people hired a tax agent to pay tax. The sample proportion p is
a point estimate of the population proportion. So, the point estimate for p for the proportion is:
Now use Z-statistic to calculate the 95% confidence interval, the formula to 95% confidence
interval is shown below:
Here, is the sample proportion, n is the sample size and Z is the critical value at a specified
level of significance. The critical value at 5% level of significance is 1.96. So, the confidence is
calculated as:
Hence, the 95% con fi den ce in t er v a l of the pr opo r t ion of tax payers who lod ge the tax return
by using an A g ent is (0.712, 0.767).
c.
The lower limit of the confidence interval is 0.712 and the upper limit is 0.767. The confidence
interval contains the sample proportion value (0.74), so it can say that sample of 1000 people is
a representative of the population.
Section 3:
a.
The variable lodgment method is a qualitative variable which have two categories as “A=Tax
agent” or “B=Self prepare”. So, a pie chart will be suitable for lodgment method, the
percentage of frequencies for each type of preference of lodgment method is shown below:
Document Page
48.71%
51.29% Agent
Self Prepared
So, the number of people who hire tax agent to pay tax are 245 and the number of people who
pays tax by self-preparation are 258.
b.
The sample size is 245 and 500 people hired a tax agent to pay tax. The sample proportion p is a
point estimate of the population proportion. So, the point estimate for p for the proportion is:
Now use Z-statistic to calculate the 95% confidence interval, the formula to 95% confidence
interval is shown below:
Here, is the sample proportion, n is the sample size and Z is the critical value at a specified
level of significance. The critical value at 5% level of significance is 1.96. So, the confidence is
calculated as:
Hence, the 95% con fi den ce in t er v a l of the pr opo r t ion of tax payers who lod ge the tax return
by using an A g ent is (0.443, 0.530).
c.
The lower limit of the confidence interval is 0.443 and the upper limit is 0.530. The confidence
interval contains the sample proportion value (48.7%), so it can say that sample of 503 people is
a representative of the population.
Thus, the people who prefer to hire tax agent in dataset 1 is greater than the people who prefer
hire tax agent in dataset 2. So, dataset 2 indicates almost equal number of persons prefer to hire
tax agent or self-preparation to pay the tax while dataset 1 indicates most of the persons prefer
to hire tax agent than self-preparation to pay the tax.
Section 4:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
a.
The age group is a quantitative variable and the lodgment method is a qualitative variable, the
obtained histogram and the frequency for each age group corresponding to the age groups by
using excel is shown below:
Count of
Lodgment_method
Column Labels
Row Labels Agent Self
Prepared
Grand
Total
0 41 16 57
1 34 15 49
2 57 11 68
3 78 16 94
4 85 22 107
5 86 15 101
6 75 17 92
7 82 27 109
8 74 30 104
9 61 38 99
10 51 41 92
11 16 12 28
Grand Total 740 260 1000
The histogram for row percentages is shown below:
0 1 2 3 4 5 6 7 8 9 10 11
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Self Prepared
Agent
So, the maximum age group belongs to the age 5.
b.
The chi-square test applied to test the association between the categorical variables. The
obtained analysis for age group corresponding to the lodgment method are done in dataset 1
excel worksheet:
The formula for the test statistic is given below:
Document Page
Here, is the expected frequency and is the observed frequency. The chi-square test will be
used if expected frequency is greater than or equal to 5. The formula to calculate the expected
frequencies is shown below:
The calculated expected frequencies for all the age groups corresponding to the lodge method is
shown below:
Count of
Lodgment_method
Column Labels
Row Labels Agent Self
Prepare
d
Grand
Total
0 42.18 14.82 57
1 36.26 12.74 49
2 50.32 17.68 68
3 69.56 24.44 94
4 79.18 27.82 107
5 74.74 26.26 101
6 68.08 23.92 92
7 80.66 28.34 109
8 76.96 27.04 104
9 73.26 25.74 99
10 68.08 23.92 92
11 20.72 7.28 28
Grand Total 740 260 1000
All of the expected frequencies are greater than 5, so chi-square test for association will be used
for analysis. Consider the null and the alternate hypothesis as shown below:
Null hypothesis: There is no association between the age group corresponding to the lodgment
method.
Alternate hypothesis: There is an association between the age group corresponding to the
lodgment method.
The chi-square statistic calculations are shown below:
Count of
Lodgment_method
Column Labels
Row Labels Agent Self
Prepare
d
Grand
Total
0 0.033 0.094 0.127
1 0.141 0.401 0.542
2 0.887 2.524 3.411
Document Page
3 1.024 2.915 3.939
4 0.428 1.218 1.645
5 1.696 4.828 6.525
6 0.703 2.002 2.705
7 0.022 0.063 0.086
8 0.114 0.324 0.438
9 2.052 5.839 7.891
10 4.285 12.196 16.481
11 1.075 3.060 4.135
Grand Total 12.460 35.464 47.924
The degree of freedom for the test is:
The p-value for the chi-square test is less than 0.0005.
c.
According to the results obtained, the value of chi-Square test statistic is 47.92. So, the p-value of
the test is less than the level of significance 0.05, thus the null hypothesis of the test gets rejected.
Hence, it can conclude that there is an association between the age group corresponding to the
lodgment method.
Section 5:
The total income is a quantitative variable and the lodgment method is a qualitative variable,
the obtained boxplot by using the Statkey is shown below:
The above boxplot indicates the outliers in the data set of total income corresponding to lodge
method agent and self-prepared.
The obtained dot plot is shown below:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
So, maximum number of people who wants to hire a tax agent have total income between 0 to
50000.
The obtained summary statistics is shown below:
Statistics A S Overall
Sample
Size 740 260 1000
Mean 60601.249 43878.846 56253.424
Standard
Deviation 70226.303 42013.481 64495.602
Minimum -7752 0 -7752
Q1 25320.50 18216.00 23017.50
Median 46077.50 37318.00 44113.50
Q3 73555.00 57724.50 70593.00
Maximum 1052414 352377 1052414
The average income who prefer tax agent is 60601.24 and the maximum income is 1052424.
The average total income for who prefer self-prepare to pay tax is 43878.84 and the maximum
total income for self-prepared is 352377.
b.
The distribution of income for the lodgment method is positive skewed as most of the income
belongs to the left side. So, it can say the data for the total income is skewed and data is not
normally distributed. The boxplot shows outliers in the dataset, which indicates data for income
is non-normally distributed.
Section 6:
The scatterplot is a way to represent the visual relationship between two quantitative variables,
the visual representation indicates the strength of relationship between the variables or how
they are associated. The one variable can be considered as explanatory variable and another
Document Page
variable can be considered as the response variable. The positive trend of scatterplot indicates a
positive association between the variables, as value of one variable increases the corresponding
value of another variable also increases. (Rubin, 2009, p.209). The negative trend of scatterplot
indicates a negative association between the variables, as value of one variable increases the
corresponding value of another variable decreases.
The no trend of scatterplot indicates a non-association between the variables. Correlation is a
measure of the relationship between the two variables. It measures the strength of relationship
between two or more normally distributed interval or ratio level variables. The coefficient of
correlation is denoted by r, and the value of correlation coefficient lies value between +1 and −1
inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative
correlation.
In general, if (r) lies between 0-0.19, then the strength of relationship between two variables is
very weak. If (r) lies between 0.20-0.39 then strength of relationship between two variables is
weak. If (r) lies between 0.40-0.59 then strength of relationship between two variables is
moderate. If (r) lies between 060-0.79 then strength of relationship between two variables is
strong. And, if the value of correlation coefficient (r) lies between 0.79-0.99 then it can say that
the strength of relationship between two variables is very strong. (Israel, 2009, p.111). The
scatterplot for the total income amount and total deduction amount for people who hire a tax
agent and self-prepared is shown below:
-200000 0 200000 400000 600000 800000 1000000 1200000
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
Agent Self Prepared
Total Income Amount
Total deduction Amount
Thus, as the value of the variable on the horizontal axis increases weakly, the corresponding
value of the variable on the vertical axis increases weakly.
And, as the value of the variable on the horizontal axis increases weakly, the corresponding
value of the variable on the vertical axis increases weakly. Thus, there is a weak positive
association between the variables.
The value of correlation coefficient for the relationship between total income amount and total
deduction amount for people who hire a tax agent is 0.385. And, the value of correlation
coefficient for the relationship between total income amount and total deduction amount for
people who self-prepared is 0.396.
b.
Document Page
The value of the correlation coefficient is 0.385 for the association between total income
amount and total deduction amount for people who hire a tax agent. And, the value correlation
coefficient is 0.396 for the association between total income amount and total deduction
amount for people who prepare by self.
Thus, there is a weak positive association between total income amount and total deduction
amount for people who hire a tax agent, and there is a weak positive association between total
income amount and total deduction amount for people who prepare by self.
Section 7:
a.
The number of people who hire tax agent to pay tax are 740 and the number of people who
pays tax by self-preparation are 260. The 95% con fi den ce i nt er v a l of the pr opo r t ion of tax
payers who l odg e the tax return by using a tax a gen t is (0.712, 0.767), so it can say that sample
of 1000 people is a representative of the population.
The number of people who hire tax agent to pay tax are 245 and the number of people who
pays tax by self-preparation are 258. The 95% con fi den ce i nt er v a l of the pr opo r t ion of tax
payers who l odg e the tax return by using an A g ent is (0.443, 0.530), so it can say that sample
of 503 people is a representative of the population.
There is an association between the age group corresponding to the lodgment method.
The maximum number of people who wants to hire a tax agent have total income between 0 to
50000 and the maximum number of people who wants to self-preparation have total income
between 0 to 50000. The average income who prefer tax agent is 60601.24 and the maximum
income is 1052424, and the average total income for who prefer self-prepare to pay tax is
43878.84 and the maximum total income for an individual id 352377.
The value of the correlation coefficient is 0.385 for the association between total income
amount and total deduction amount for people who hire a tax agent. And, the value correlation
coefficient is 0.396 for the association between total income amount and total deduction
amount for people who prepare by self.
Thus, there is a weak positive association between total income amount and total deduction
amount for people who hire a tax agent, and there is a weak positive association between total
income amount and total deduction amount for people who prepare by self.
b.
The people who prefer to hire tax agent in dataset 1 is greater than the people who prefer hire
tax agent in dataset 2. The distribution of income for the lodgment method is positive skewed as
most of the income belongs to the left side. So, it can say the data for the total income is skewed
and data is not normally distributed for the data set 1, and the data may conclude wrong
findings. Thus, researcher should collect the data again to do the analysis for the further
research.
References:
Goodwin, S. (2012) SAGE secondary data analysis. India: SAGE publications Pvt. Ltd.
Morgan, D. (2013) Integrating Qualitative and Quantitative methods: A Pragmatic Approach.
India: SAGE publications Pvt. Ltd.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Bethlehem, J. (2010) Applied Survey Methods. United States of America: JOHN WILEY & SONS,
INC., Publication.
Brace, I. (2008) questionnaire Design: How to Plan, Structure and Write Survey Material For
Effective Market Research. Second edition. USA: Kogan Page publishers.
Rubin, A. (2009) Statistics for Evidence-based Practice and Evaluation. Second edition. Canada:
Cengage Learning.
Israel, D. (2009) Data Analysis in Business Research. India: SAGE publications Pvt. Ltd.
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]