BUS5SBF - Analyzing Household Survey Data: Statistics for Business

Verified

Added on  2020/02/24

|13
|1557
|168
Homework Assignment
AI Summary
This assignment analyzes a household survey dataset, applying various statistical techniques to extract meaningful insights. Task 1 focuses on sampling methods, descriptive statistics (including measures of central tendency and dispersion), and assessing the normality of expense distributions, using the coefficient of variation to compare variability across different expense categories and identifying outliers. Task 2 involves creating frequency distributions and histograms to analyze utility expenses, calculating probabilities based on expense ranges, and interpreting the shape of the distribution. Task 3 delves into interpreting percentile values (top and bottom 10% of ataxInc), calculating and interpreting the mean of the 'own house' variable, calculating probabilities related to family size, and assessing the correlation between post-tax income and total expenses, supported by a scatter plot analysis. Finally, Task 4 examines the relationship between gender and highest degree using contingency tables, calculating probabilities based on these tables, and determining the independence of events related to education and gender.
Document Page
STATISTICS FOR BUSINESS AND FINANCE (BUS5SBF)
ANALYSING HOUSEHOLD SURVEY
[Pick the date]
Student id
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Task 1
A. The given task requires that from the given population of data a sample comprising of 200
observations needs to be selected. A sample is usually required in order to make inference
about the population characteristics while ensuring that time and cost constraints are take care
of. There are various technique through which sampling may be performed and a desired
sample may be obtained. One of the most popular probabilistic sampling techniques is
random sampling. One essential feature observed in random sampling is that every element
which comprises the population has the same likelihood of being part of the sample. Even
though this method is significantly superior to non-probabilistic techniques but it could
potentially lead to bias as the various population related attributes may not be contained in a
similar proportion as the underlying population.
Therefore, an alternative technique which could be more apt in the given scenario is stratified
random sampling. This method ensures that the sample obtained is truly representation of the
underlying population but this method is more time taking and costly in comparison with
random sampling.
B. Using Excel, the descriptive statistics of the various expense variables are illustrated as
follows.
Document Page
Document Page
C. With regards to measuring variability or dispersion of different variables, the absolute
measures of dispersion such as standard deviation are not suitable as they tend to capture the
variation without considering the intrinsic differences of mean of the variables. As a result, it
is preferable to use a relative measure of dispersion which can nullify the role of mean thus
allowing for a simple comparison of variation amongst the given variables. Thus, a suitable
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
measure for capturing the variability in the various expense variables is the coefficient of
variation. Mathematically, this parameter is computed by dividing the respective standard
deviation with the mean value.
Taking into cognizance the descriptive statistics already calculated for the variables under
question, the following table highlights the coefficient of variation for the various expenses.
Based on the above values of CoV, it is apparent that highest variation is noticed in case of
alcohol expenses while the lowest variation seems to be visible for fuel related expenses.
D. Taking into regards the measures of central tendency and dispersion obtained in the summary
statistics, observations can be made in relation to the shape and nature of the underlying
distribution. For the distribution to be normal, the following conditions need to be met.
Coincidence of mean, mode and median
Skew value of 0
Kurtosis value of 3
The summary statistics for the various expense variables under consideration apparently
reflect that neither of the conditions highlighted above are being fulfilled and hence the
concerned distributions would be classified as non-normal. Further, with regards to shape, it is
noteworthy that for the variables, there is a high value of positive skew which refers to the
presence of a right tail. This clearly points to the presence of extreme expense in each
category which is not justified by the sample. Hence, such values would be referred to as
outliers.
Document Page
Task 2
A. The requisite frequency distribution table based on the sample data highlighting the utility
expenses is indicated as follows.
.
B. For computing the probabilities required, it is imperative to consider the table highlighted in
part A.
1) Not exceeding $ 1,200 per year
Relevant households for whom the annual utility expense does not cross $ 1,200 =
20+52+51 = 123
Relevant sample size of households = 200
Probability computation = 123/200 = 0.615 or 61.5% of all households
2) Exceeding $ 1,200 per year but not exceeding $ 2,400 per year
Relevant households for whom the annual utility expense exceeds $ 1,200 but does not
cross $ 2,400 = 38+17+9 = 64
Relevant sample size of households = 200
Probability computation = 64/200 = 0.32 or 32% of all households
3) Exceeding $ 2,400 per year
Document Page
Relevant households for whom the annual utility expense exceeds $ 2,400 = 7+4+2 = 13
Relevant sample size of households = 200
Probability computation = 13/200 = 0.065or 6.5% of all households
C. The histogram capturing the annual utilities expenses and based on the frequency distribution
table above is drawn below.
Visual observation of the histogram drawn above clearly reflects that there is a lengthy tail at the
right end primarily because there are certain households in which annual expenditure on utilities
is exceptionally high. These values are known as outliers and the tail hints at the distribution
being skewed. Since a normal distribution is characterized by absence of skew, hence the annual
utilities expense is not normally distributed.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Task 3
A. It is apparent from the below highlighted table that the top 10% ataxInc is $107,774 and
bottom 10% ataxInc is $21,250.
Interpretation
These two values provide the estimation of the financial status of the society. The top10% and
bottom 10% value shows that significant difference or inequality existing in society in terms of
economic position.
B. The mean of own house variable is given below:
Interpretation
The mean of the variable “own house” shows the % of the houses which are being used by their
owners for dwelling purposes. The mean value comes out to be 74%. This means the percentage
of houses which are being used by their owners for personal abode is 74%.
C. Family size would be summation of two variables i.e.
Family ¿ Number of childern+ Number of Adults
The total count where the family size is representing 5 family members is shown below:
Document Page
Count (Family size same as five) = 15
Total count of family (same as sample size) = 200
Probability that a randomly chosen household would show the family size same as five
¿ Count (Family ¿ as five)/Totalcount of family
¿ 15
200
¿ 0.075
Or ¿ 7.5 %
Therefore, there is only 7.5% probability by which the randomly chosen household would have
family size same as five.
Document Page
8 8.5 9 9.5 10 10.5 11 11.5 12 12.5
6
7
8
9
10
11
12
13
Scatter Plot
ln (ATax Inc)
ln (Texp)
Two variables show moderate relationship when the correlation coefficient between them is
higher than 0.5 but less than 0.75. However, this relationship would said to be strong when the
value is near to or equal to 1. In the present case scenario, the value is 0.56 only, which shows
that post tax income and total amount on expenses are moderately correlated with each other.
This is also evident from the scatter plot. Hence, there are some points on the scatter plot in
which strong linear relationship can be seen. This indicates that high post tax income and high
total expenditure tend to occur together. Also, there are some points for which the above
relationship is not true.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Task 4
A. In order to find the relationship between the two non-numeric variables i.e. gender and
highest degree contingency table would be made. The table is highlighted below:
Interpretation
The gender difference for the education qualification is apparent from the contingency table.
This is because for the various respective highest degrees the allocation across the two genders is
not in accordance with their respective population shares in the sample. This is the most apparent
in the primary category where the % representation of males is significantly higher in
comparison with females.
B. The probability based on the above furnished contingency table that randomly selected
household head would have highest degree as Intermediate and the gender is female is
computed below:
Total count of household = 200
Total count of females with intermediate as highest degree = 20
P ( Highest degree : Intermediate ,Gender : Female )= 20
200 =0.1
There is only 1% probability that a randomly selected household head would have highest degree
as Intermediate and gender as female.
Document Page
C. The probability based on the above furnished contingency table that randomly selected
household head would have highest degree as Bachelor and the gender is male is computed
below:
Total count of household = 200
Total count of males with Bachelor as highest degree = 16
P ( Highest degree : Bachelor , Gender : Male )= 16
200 =0.08
There is only 8% probability that a randomly selected household head would have highest degree
as Bachelor and the gender is male.
D. The proportion based on the above furnished contingency table that randomly selected
household female head would have highest degree as Secondary is computed below:
Total count of female household = 104
Total count of females with Secondary as highest degree = 22
Proportion ( Female householdHighest degree : Secondary )
¿ 22
104 =0.211
There is 21.15 % proportion that a randomly selected household female head would have highest
degree as Secondary.
E. When the condition P(X and Y) = P(X) * P(Y) is true for the two cases, then these two cases
would be termed as independent.
Case 1 – “highest education degree is Masters”
Case 2 – “Head of the household is male”
Total count of household = 200
P ( XY ) = 20
200 =0.1
P ( X )= 96
200 =0.48
Document Page
P ( Y ) = 41
200 =0.205
Now,
P ( X )P ( Y ) =0.480.205=0.0984
It is apparent that both the sides are not equal and hence, the condition is not satiated and hence,
the events do not seem independent.
chevron_up_icon
1 out of 13
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]