HI6007 Group Assignment: Statistical Analysis and Regression Models

Verified

Added on 2023/06/04

AI Summary

This assignment solution for HI6007 Statistics for Business Decisions covers several key statistical concepts. Question 1 focuses on constructing frequency distributions (frequency, cumulative frequency, relative frequency, cumulative relative frequency, and percentage frequency) and histograms from a dataset of examination scores, along with a comment on the distribution's shape. Question 2 delves into regression analysis, analyzing a computer output to determine the sample size, the relationship between demand and unit price, the coefficient of determination and correlation, and predicting supply based on unit price. Question 3 involves constructing an ANOVA table to assess the impact of different programs on worker productivity, followed by a recommendation. Finally, Question 4 uses Excel's Regression Tool to estimate a regression equation, assess its overall significance, determine the significance of individual variables, and re-estimate the model after dropping insignificant variables, including interpreting the slope coefficients.

Running head: STATISTICS FOR BUSINESS DECISION 1
HOLMES INSTITUTE FACULTY OF HIGHER EDUCATION
HI6007 Group Assignment
(Name of Student)
(University)
(Date of Submission)

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS FOR BUSINESS DECISION 2
Table of Contents
Introduction:............................................................................................................................. 3
Question 1................................................................................................................................ 3
Question 2................................................................................................................................ 7
Question 3................................................................................................................................ 9
Question 4.............................................................................................................................. 11
Conclusion:............................................................................................................................ 16
References............................................................................................................................. 17

STATISTICS FOR BUSINESS DECISION 3
Introduction:
The solved questions in the following are respectively providing us the concept of
frequency distribution and histograms. In the second question, simple linear regression model
brings the idea about sample size, correlation co-efficient, co-efficient of determination and
predicted values. Also, in the third question, one-way ANOVA is accomplished with a
certain level of significance. Correspondingly, in the fourth question, multiple regression
model is executed with the help of MS Excel. This model helps to generate an understanding
about estimation, significance of the independent variables and slope co-efficient. Not only
that, after identification of predictor, elimination of that predictor is accomplished. Finally,
the creation of new regression model with single significant predictor is also shown in this
context.
Question 1
Below you are given the examination scores of 20 students (data set also provided in
accompanying MS Excel file).
52 99 92 86 84
63 72 76 95 88
92 58 65 79 80
90 75 74 56 99
a. Construct a frequency distribution, cumulative frequency distribution, relative
frequency distribution, cumulative relative frequency distribution and percent
frequency distribution for the data set using a class width of 10.

STATISTICS FOR BUSINESS DECISION 4
Using a class width of 10, the range is obtained as; (50-59), (60-69), (70-79), (80-89) and
(90-99).
i. Frequency distribution for the dataset
Class Range/ Interval Class Midpoint (x) Frequency
50-59 54.5 3
60-69 64.5 2
70-79 74.5 5
80-89 84.5 4
90-99 94.5 6
Total 20
ii. Cumulative frequency distribution for the dataset
Class
Range/Interval
Class Midpoint
(x)
Frequency Cumulative
Frequency
50-59 54.5 3 3
60-69 64.5 2 5
70-79 74.5 5 10
80-89 84.5 4 14
90-99 94.5 6 20
Total 20
iii. Relative Frequency distribution for the dataset
Class Range/ Class Mid- Frequency Relative

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS FOR BUSINESS DECISION 5
Interval point (x) Frequency
50-59 54.5 3 3/20 = 0.15
60-69 64.5 2 2/20 = 0.10
70-79 74.5 5 5/20 = 0.25
80-89 84.5 4 4/20 = 0.20
90-99 94.5 6 6/20 = 0.30
Total 20 1.00
iv. Cumulative relative Frequency distribution for the dataset
Class
Range/Interval
Class Mid-
point (x)
Frequency (f) Cumulative
Relative
Frequency (cf)
50-59 54.5 3 0.15
60-69 64.5 2 0.25
70-79 74.5 5 0.50
80-89 84.5 4 0.70
90-99 94.5 6 1.00
Total 20
v. Percentage Frequency distribution for the dataset
Range Class
Midpoint (x)
Frequency Percentage (%)
50-59 54.5 3 15%
60-69 64.5 2 10%
70-79 74.5 5 25%
80-89 84.5 4 20%

STATISTICS FOR BUSINESS DECISION 6
90-99 94.5 6 30%
Total 20 100%
b. Construct a histogram showing the percent frequency distribution of the examination
scores. Comment on the shape of the distribution.
59 69 79 89 99 More
0
1
2
3
4
5
6
7
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Histogram
Frequency
Cumulative %
Bin
Frequency
Figure 1: Histogram showing the percent frequency distribution of the examination scores.
Comment: The histogram above shows the distribution of the data which are examination
scores. Based on the figure, it is evident that the performances are high since the highest
scores are above 90 percent. The scores between 60 and 69 were low. The histogram also
suggest a normal distribution of the scores since there is no single outlier.
Question 2

STATISTICS FOR BUSINESS DECISION 7
Shown below is a portion of a computer output for a regression analysis relating supply (Y in
thousands of units) and unit price (X in thousands of dollars).
ANOVA
df SS
Regression 1 354.689
Residual 39 7035.262
Coefficients Standard Error
Intercept 54.076 2.358
X 0.029 0.021
a. What has been the sample size for this problem?
The sample size for this problem is equivalent to the sum of the degree of freedom of the
regression and the residual.
Thus the sample size for the problem = (1+39) = 40 units/items
b. Determine whether or not demand and unit price are related. Use α = 0.05.
From the above output, the relationship between demand and unit price can be determine
through examination of the coefficient of the unit price (X variable).
The coefficient of unit price is 0.029. This is a positive value and thus we conclude that the
demand and the unit price related.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS FOR BUSINESS DECISION 8
c. Compute the coefficient of determination and fully interpret its meaning. Be very specific.
The coefficient of determination helps in describing how well a regression line fits. i.e.
goodness of fit. Coefficient of determination (R²) is computed is computed based on the
following formula;
R² = 1- ∑ of Squares of Residuals
Total ∑ of Squares = 1- 7035.262
7389.951 = 0.047 which is approximately equals
to 0.
Given that the computed coefficient of determination = 0.047, it follows that the regression
line (line of best fit) does not fit the set of data points.
d. Compute the coefficient of correlation and explain the relationship between supply and
unit price.
The coefficient of determination (R²) is the square of the coefficient of the correlation ®.
But the coefficient of the determination computed in (c) above is 0.047.
Hence coefficient of correlation (R) = √Coefficient of determination
R = √0.047 = 0.2190
Based on the correlation of the coefficient computed which is 0.2190, this value is
positive and thus there exist a positive significant relationship/correlation between supply
and unit price.
d. Predict the supply (in units) when the unit price is $50,000.
The regression equation is; Supply (Y) = 54.076 + 0.029 × (50,000) + Standard error
Supply (Y) = 54.076 + 0.029 × (50,000) + 0.021

STATISTICS FOR BUSINESS DECISION 9
Supply = $ 1504.10
Question 3
Allied Corporation wants to increase the productivity of its line workers. Four different
programs have been suggested to help increase productivity. Twenty employees, making up a
sample, have been randomly assigned to one of the four programs and their output for a day's
work has been recorded. You are given the results below (data set also provided in
accompanying MS Excel file).
Program A Program B Program C Program D
150 150 185 175
130 120 220 150
120 135 190 120
180 160 180 130
145 110 175 175
a. Construct an ANOVA table.
A single factor or one-way analysis of variance (ANOVA) is used in this case to test the
null hypothesis that the means of the four programs are all equal. i.e.

STATISTICS FOR BUSINESS DECISION 10
Ho: μA = μB = μC = μD
H1: At least one inequality among μA, μB, μC and μD.
The one-way AVOVA table constructed in excel spreadsheet is as shown below;
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Program A 5 725 145 525
Program B 5 675 135 425
Program C 5 950 190 312.5
Program D 5 750 150 637.5
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 8750 3 2916.667 6.140351 0.00557 3.238872
Within Groups 7600 16 475
Total 16350 19
b. As the statistical consultant to Allied, what would you advise them? Use a .05 level of
significance.
At 0.05 level of significance, F=6.140351 which is greater than F critical (3.238872). We
therefore reject the null hypothesis. It implies that the means of all the three programs are not
equal. As a statistical consultant to Allie, I would therefore advise them not to implement the
four programs among the line workers if they want to increase the productivity.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

STATISTICS FOR BUSINESS DECISION 11
Question 4
A company has recorded data on the weekly sales for its product (y), the unit price of the
competitor's product (x1), and advertising expenditures (x2). The data resulting from a
random sample of 7 weeks follows. Use Excel's Regression Tool to answer the following
questions (data set also provided in accompanying MS Excel file).
Week Price (x1) Advertising (x2) Sales
1 0.33 5 20
2 0.25 2 14
3 0.44 7 22
4 0.40 9 21
5 0.35 4 16
6 0.39 8 19
7 0.29 9 15
a. What is the estimated regression equation? Show the regression output.
The estimated regression line equation is; y = Sales = 3.597615 + 41.32002 × Price + 0.013242 ×
Advertising.
Sales = 3.597615 + 41.32002 × Price + 0.013242 × Advertising.
The regression outputs from the excel;

STATISTICS FOR BUSINESS DECISION 12
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.877814
R Square 0.770558
Adjusted R
Square 0.655837
Standard
Error 1.83741
Observation
s 7
ANOVA
df SS MS F
Significanc
e F
Regression 2
45.3528
4
22.6764
2
6.71680
1 0.052644
Residual 4 13.5043
3.37607
5
Total 6
58.8571
4
Coefficient
s
Standar
d Error t Stat P-value
Lower
95%
Upper
95%
Lower
90.0%
Upper
90.0%
Intercept 3.597615
4.05224
4
0.88780
8
0.42480
5 -7.65322
14.8484
5
-
5.04115
12.2363
8