Statistics for Business and Finance: Data Analysis Report (BUS5SBF)

Verified

Added on 2022/07/28

AI Summary

This report, prepared for a Statistics for Business and Finance course, presents a comprehensive analysis of a dataset. It begins by discussing sampling techniques, specifically simple random sampling and stratified random sampling, highlighting their strengths and weaknesses. The report then delves into descriptive statistics, utilizing box-whisker plots to visualize variables like Alcohol, Meals, Fuel, and Phone, identifying outliers and skewness. Furthermore, it explores inferential statistics, calculating percentiles for variables such as ATaxlnc and proportions for OwnHouse. The analysis extends to correlation, examining the relationship between ln(ATaxlnc) and ln(Texp) through scatter plots and correlation coefficients. Finally, the report addresses probability calculations based on cross-tabulated data, determining conditional probabilities related to household heads and their degrees. The report incorporates the use of Excel for data manipulation and visualization, providing a detailed statistical analysis of the provided data set. The references include sources like Flick (2015), Hair et. al. (2015), Hillier (2016), Medhi (2016), and Taylor and Cihon (2017).

Statistics for Business and Finance
[DATE]
[Company name]
[Company address]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Task 1
Part A)
In order to select the sample of 250 observations from the population provided, the sampling
method used is SRS (Simple Random Sampling). This is a sampling technique where every
population element has equal chance of selection since no criterion for selection is applied.
As a result, in case of population where there are some key attributes which ought to be
represented, SRS may lead to a biased sample selection. This may be attributed to complete
disregard to different attributes when SRS is implemented. As a result, the resultant sample
would have misrepresentation of the vital attributes and thereby lower the reliability of the
results obtained from this sample (Flick, 2015).
A superior alternative to sample selection in such scenarios is stratified random sampling.
This sampling technique continues to maintain the randomness element while ensuring that
that key attribute representation in sample is quite similar to that in population. This happens
since this sampling technique operates in two steps unlike one in SRS. The first step involves
division of the population based on the pivotal attributes. One this is completed, then using
random selection, suitable sample is withdrawn from each of these attributes so that the
representation of attributes in population and sample tends to be as similar as possible (Hair
et. al., 2015).
Part B)
The variable of interest for descriptive statistics are Alcohol, Meals, Fuel and Phone.
1

For the variables Alcohol, Meals, Fuel and Phone, the Box-whisker plot is shown below.
2

Part C)
The first noticeable feature of the boxplots shown above is that there is presence of multiple
outliers on the higher end for each of the various variables. These are represented by dots.
Their presence would logically imply presence of right or positive skew since right tail would
be significantly lengthier than the one on the left. The presence of skew would also mean that
the distribution for the given variables would not be termed as normal. Further, the
descriptive statistics also indicate that there is high value of positive skew present for the gien
variables coupled with non-overlapping of central tendency measures (Hillier, 2016).
Clearly, in the given sample, there are few observations for each of the variables which
correspond to those individuals who have an abnormally high expenditure related to one or
ore items. Owing to the presence of sizable outliers for each of the given variables, it is
recommended that mean and variance should not be used to denote the central tendency and
dispersion respectively. The appropriate measures of central tendency and dispersion would
be median and inter-quartile range. The reason for preference of these measures is their
ability to be resistant to the presence of outliers in their computation (Medhi, 2016).
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Task 2
Part A)
Variable of interest - ATaxlnc
90% percentile - Essentially this value would imply that 90% of the values are lower than or
equal to this. Hence, out of the given sample of 250 households, only 25 households would
have annual after tax income more than $ 101,650.90.
10% percentile - Essentially this value would imply that 10% of the values are lower than or
equal to this. Hence, out of the given sample of 250 households, exactly 225 households
would have annual after tax income more than $ 17,104.60
Part B)
Variable of interest - OwnHouse
Proportion of household (Owned House) = 182/250 = 0.728
 Probability that any 3 randomly chosen household head out of 5 household head owned
house
Part C)
4

Variable of interest – Independent variable: ln(ATaxlnc)
Dependent variable: ln(Texp)
Scatter display
Correlation coefficient has been found between the above shown variables.
The scatter plot reflects a positive slope which indicates presence of a directly proportional
relationship between the given variables. This is also confirmed from the positive value of
correlation coefficient. Further, the strength of this linear association seems medium to high
as the correlation coefficient is 0.61 as against the maximum theoretical value possible of 1.
The correlation coefficient does not throw any surprise since common logic would also
suggest a positive and strong relationship between the given two variables. The magnitude of
the relationship is lowered by the presence of some abnormalities where either expenditure is
5

disproportional when compared to income or income is disproportionately very high
compared to expenditure (Taylor and Cihon, 2017).
Task 3
Part A)
Variable of interest- GHH and Highest degree
Cross tabulation or contingency table between variables
Part B)Probability computation
P(Household head Male | Master Degree) 0.076
Part C)Probability computation
P(Householdhead Male | Among Master Degree holder males) 0.404
Part D) Probability computation
P(Householdhead Female | Bachelor Degree
Female)
0.211
Part E) The two events are independent or not
 Event X and Event Yare independent when
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

P ( X )∗P ( Y ) =P ( X∧Y )
 Event X and Event Y are not independent when
P ( X )∗P ( Y ) ≠ P ( X ∧Y )
Event of interest – Female household and Primary Degree
7

References
Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research
project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials
of business research methods. 2nd ed. New York: Routledge.
Hillier, F. (2016) Introduction to Operations Research.6th ed. New York: McGraw Hill
Publications.
Medhi, J. (2016) Statistical Methods: An Introductory Text. 4th ed. Sydney: New Age
International.
Taylor, K. J. And Cihon, C. (2017) Statistical Techniques for Data Analysis. 2nd ed.
Melbourne: CRC Press.
8