ACC73002 Business Analytics and Big Data Assignment Solution Analysis

Verified

Added on 2022/10/11

AI Summary

This document presents a comprehensive solution to a Business Analytics and Big Data assignment (ACC73002). The assignment covers various statistical concepts and their applications. The solution begins by addressing survey design questions, differentiating between discrete and continuous variables, and discussing the advantages of different survey formats for income data. The subsequent section delves into analyzing customer complaint resolution times, including constructing frequency distributions, histograms, and ogives. It calculates descriptive statistics such as mean, median, quartiles, range, variance, and coefficient of variation. A box-and-whisker plot is constructed and interpreted to assess data skewness. Finally, the solution analyzes the relationship between milk calories and fat content by calculating covariance and correlation coefficients, interpreting their significance, and drawing conclusions about the relationship between the two variables.

ACC73002 1
Student’s Name
Professor’s Name
Course
Date
Business Analytics and Big Data (ACC73002)
Question 1 (4 marks)
One of the variables most often included in surveys is income. Sometimes the question is
phrased, ‘what is your income (in thousands of dollars)?' In other surveys, the respondent is
asked to ‘Place an X in the circle corresponding to your income group and given a number of
ranges to choose from.
a) In the first format, explain why income might be considered either discrete or continuous (1
mark).
Discrete variables are variables that can be countable in a finite amount of times. For
instance, one can be able to count the amount of money present in his house safe. Continuous
variables measured can only be obtained by measuring. It is difficult to count (Vetter,2017). For
instance, the time taken to walk to school. If money is taking values that are whole numbers such
as whole pennies, then this is discrete. Since money can be converted into different currencies
and with different decimal places it can be considered as continuous by financial institutions.
b) Which of these two formats would you prefer to use if you were conducting a survey? Why
(1.5 marks)?
I would prefer the second option: Place an X in the circle corresponding to your income group
In a survey, giving a range offers the researcher a chance to get a response from all categories of
respondents. Also, during analysis, one can code and analyses data easily using analysis software
such as SPSS.
c) Which of these two formats would probably bring you a greater rate of response? Why (1.5
marks)?
The second option will give a better rate of response, especially where the income levels
in the population have a big disparity. This is because respondents will give accurate information
since the question does not directly point to their income which they could consider a secret.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

ACC73002 2
Question 2 (11 marks)
One of the major measures of the quality of service provided by any organization is the speed
with which the organisation responds to customer complaints. A large family-owned department
store selling furniture and flooring, including carpet, has undergone major expansion in the past
few years. In particular, the flooring department has expanded for installation crews to an
installation supervisor, a measurer and 15 installation crews. During a recent year the company
got 50 complaints about carpet installation. The following data represent the number of days
between the recipient of the complaint and resolution of the complaint. <Furniture>
a) Construct frequency and percentage distributions (1 mark)
Table 1: Frequency and Percentage Distribution
Class Limit Bin Frequency Percentage Cumulative Pctage.
less than 0 0.01 0 0.0% 0.0%
0 but less than 25 24.99 17 34.0% 34.0%
25 but less than 50 49.99 19 38.0% 72.0%
50 but less than 75 74.99 5 10.0% 82.0%
75 but less than 100 99.99 2 4.0% 86.0%
100 but less than 125 124.99 3 6.0% 92.0%
125 but less than 150 149.99 2 4.0% 96.0%
150 but less than 175 174.99 2 4.0% 100.0%
b) Construct histogram and percentage polygons (1 mark).
-- 12.5 37.5 62.5 87.5 112.5 137.5 162.5
0
5
10
15
20
Histogram of Days
Midpoints
Frequency
Figure 1:Days Histogram

ACC73002 3
-- 12.5 37.5 62.5 87.5 112.5 137.5 162.5
0%
5%
10%
15%
20%
25%
30%
35%
40%
Percentage Polygon
Days
Mid-Points
Percentage
Figure 2: Days Percentage Polygon
c) Construct a cumulative percentage distribution and plot the corresponding ogive (1
mark).
Table 2: Cumulative Percentage Distribution
Class Limit Bin Cumulative Pctage.
less than 0 0.01 0.0%
0 but less than 25 24.99 34.0%
25 but less than 50 49.99 72.0%
50 but less than 75 74.99 82.0%
75 but less than 100 99.99 86.0%
100 but less than 125 124.99 92.0%
125 but less than 150 149.99 96.0%
150 but less than 175 174.99 100.0%
0.01 24.99 49.99 74.99 99.99 124.99 149.99 174.99
0%
20%
40%
60%
80%
100%
120%
Cumulative Percentage Polygon (Ogive)
Days
Mid-Points
Percentage
Figure 3: Days Ogive

ACC73002 4
d) Calculate the mean, median, first quartile and third quartile (2 marks).
i. Mean=Sum of Observations/Number of observations
In excel =Average()
Mean=2152/50=43.04 Days
ii. Median is the data value that appears in the centre of a dataset when arranged in
ascending order (Cox, D.R., 2018). For odd numbers, the middle number is one value but
for even you get to divide the two values at the middle by two.
Therefore Median= (28+29)/2=28.5 Days
iii. First Quartile is the middle number that occurs between the smallest value and the
median value. It gives the 25th percentile of the data values. In this case between its
13.75days
iv. The third Quartile is the middle value that exists between the median and the topmost
value. It is also the 75th percentile of the data. In this case, it is 55.75days
e) Calculate the range, interquartile range, variance, standard deviation and coefficient of
variation (2 marks).
i. Range=Maximum-Minimum value=165-1=164
ii. Interquartile Range=Q3-Q1=55.75-13.75=42
iii. Variance= ∑ ( X −Xbar ) 2
n =1757.794
iv. Standard deviation= √Variance=√1757.794=¿41.93
v. Coefficient of Variation =Standard deviation/Mean=41.93/43.04=0.9741*100=97.41%
Table 3: PHSTAT Descriptive Summary
Descriptive Summary Days
Mean 43.04
Median 28.5
Mode 5
Minimum 1
Maximum 165
Range 164
Variance 1757.79
Standard Deviation 41.93
Coeff. of Variation 97.41%
Skewness 1.488
Kurtosis 1.309
Count 50

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

ACC73002 5
Standard Error 5.93
f) Construct a box-and-whisker plot. Are the data skewed? If so, how (2 marks)?
0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0
0
1
2
3
4
Days
W a i t i ng Ti me Bo x pl o t
Days
From the diagram above the mean value is greater than median and mode value hence the
data is positively skewed. This implies that it has a longer tail on the right side (Holcomb, 2016).
g) On the basis of a to f, if you had to report to the manager on how long a customer should
expect to wait to have a complaint resolved, what would you say? Explain (2 marks).
On a normal day, a customer should expect to wait for about 43 days with a deviation of
about 41.93 days. The median is 28.5 days. Therefore the management should target at least to
achieve this for every instance. The number of days a complaint took to be solved fastest was a
day with the longest being 165days. The management, therefore, needs to employ more human
resource and ensure customers complaints are solved in the shortest time possible for better
service delivery.

ACC73002 6
Question 3 (5 marks)
The datafile <milk> gives nutrition content (number of calories and total fat, in grams) per
250ml of a random sample of 20 fresh milk available in Australia.
a) Calculate the covariance (1 mark).
=COVAR (B2:B21, C2:C21) = 64.43
b) Calculate the coefficient of correlation (1 mark).
=CORREL (B2:B21, C2:C21) =0.8622
c) Which do you think is more valuable in expressing the relationship between calories and fat
content – the covariance or the coefficient of correlation? Explain (1.5 marks).
The coefficient of correlation is more variable in providing the relationship between the
two variables. This statistic is given by letter r and assumes values between -1 and +1. ( Johnson
and Bhattacharyya,2019).In this case, r=0.8622. This implies that there exists a strong positive
relationship between calories and fat content.
d) What conclusions can you reach about the relationship between calories and fat content (1.5
marks)?
The strong positive relationship implies that as the levels of fat increases, the levels of fat
also increases. It is also true that when the total fat levels decrease, the calories levels also
decrease. There is, therefore, a perfect positive relationship between the two variables.

ACC73002 7
Works Cited
Cox, D.R., 2018. Applied statistics-principles and examples. Routledge
Holcomb, Z.C., 2016. Fundamentals of descriptive statistics. Routledge.
Johnson, R.A. and Bhattacharyya, G.K., 2019. Statistics: principles and methods. John Wiley &
Sons.
Vetter, T.R., 2017. Fundamentals of research data and variables: the devil is in the details.
Anesthesia & Analgesia, 125(4), pp.1375-1380.

1 out of 7

ACC73002 Business Analytics and Big Data Assignment Solution Analysis

Paraphrase This Document

Paraphrase This Document

Related Documents

Statistics Report: Statistical Data Analysis and Interpretation

+13062052269

info@desklib.com

ACC73002 Business Analytics and Big Data Assignment Solution Analysis

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

Statistics Report: Statistical Data Analysis and Interpretation

+13062052269

info@desklib.com