Statistical Analysis, Probability, and Regression Assignment

Verified

Added on  2021/02/20

|15
|3021
|267
Homework Assignment
AI Summary
This statistics assignment provides a comprehensive analysis of various statistical concepts and their applications in a business context. The assignment begins with constructing frequency distributions and histograms, followed by calculations of mean, median, and mode. It delves into concepts of population versus sample and calculates standard deviation, interquartile range, and correlation coefficients. Regression analysis is performed, including the interpretation of regression equations and the coefficient of determination. Probability is explored through various scenarios, including conditional probability, probability trees, and the assessment of independence. Furthermore, the assignment covers the binomial distribution and normal distribution, including z-tests. The document provides detailed solutions and interpretations for each question, demonstrating a strong understanding of statistical principles and techniques relevant to business research and data analysis.
Document Page
STATISTICS
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
TABLE OF CONTENTS
INTRODUCTION...........................................................................................................................1
QUESTION 1...................................................................................................................................1
a. Constructing frequency distribution with application of 10 classes........................................1
b. Constructing histogram...........................................................................................................2
c. Calculating mean, median and mode......................................................................................2
QUESTION 2...................................................................................................................................2
a. Is above a population or a sample...........................................................................................2
b. Calculating standard deviation................................................................................................3
c. Calculating Interquartile range of chocolate bar sold.............................................................3
D. Calculating correlation coefficient.........................................................................................3
QUESTION 3...................................................................................................................................4
A. Calculating and interpreting regression equation...................................................................4
b. Calculating and interpreting coefficient of determination......................................................5
QUESTION 4...................................................................................................................................5
a. Probability of Holmes or receiving Grass roots training.........................................................5
b. Probability of External and scientific training.......................................................................6
c. Conditional probability of player in Holmes with scientific training.....................................6
d. Assessing that training is independent from recruiter.............................................................6
QUESTION 5...................................................................................................................................7
A. Probability consumer comes from segment A if it is known that this consumer prefers
Product X over Product Y and Product Z...................................................................................7
B Probability that random consumer's first preference is product X .........................................8
QUESTION 6...................................................................................................................................8
A. Probability that 2 or less of those 8 people would not purchase anything.............................8
QUESTION 7...................................................................................................................................8
A. Assuming a normal distribution, assess probability that apartment would sell over $2
million.........................................................................................................................................9
B Probability that apartment would sell over $1 million but less than $1.1 million...................9
Document Page
QUESTION 8.................................................................................................................................10
A. Z test could be implied or not..............................................................................................10
B. Probability that 30% investors will be willing for commit $1 million or more...................10
CONCLUSION..............................................................................................................................11
REFERENCES..............................................................................................................................12
Document Page
INTRODUCTION
Statistics is known as branch of mathematics working with data collection, analysis,
organization, presentation and interpretation (Amrhein, Trafimow and Greenland, 2019). The
present report is on basis of understanding principles and techniques of business research and
statistical analysis. It will provide appropriate evaluation of valid statistical techniques for
solving business issues. In the same series, it will give proper justification of outcome of
statistical analysis with reference to business problem solving. Moreover, it will give application
of statistical knowledge for summarising data graphically and even interpretation which is best
fit business solution. Simultaneously, it will articulate about probability with different methods
as Z score, normal distribution and many more.
QUESTION 1
a. Constructing frequency distribution with application of 10 classes
minimum 169
maximum 3045
Range 2876
Class width 288
Sample size 60
Class Frequency
Relative
frequency
Cumulative
relative
frequency Class midpoint
169 – 457 17 0.28 0.28 312.80
458 – 745 13 0.22 0.50 601.40
746 – 1034 9 0.15 0.65 890.00
1035 – 1322 7 0.12 0.77 1178.60
1323 – 1611 5 0.08 0.85 1467.20
1612 – 1900 2 0.03 0.88 1755.80
1901 – 2188 2 0.03 0.92 2044.40
2189 – 2477 2 0.03 0.95 2333.00
2478 – 2765 1 0.02 0.97 2621.60
1
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
2766 – 3054 2 0.03 1.00 2910.20
b. Constructing histogram
c. Calculating mean, median and mode
Mean= 58594 / 60
= 976.57
Median= 797.5
Mode= 401
QUESTION 2
a. Is above a population or a sample
Population Sample
It is replicated as collection of each element
possessing common feature, which consist on
universe.
It is known as subgroup of members of
population selected for participation in the
study.
It consists of every unit of the group. It comprises only handful units population.
It is a parameters. This is an statistic.
The data could be completed with census or
enumeration (Lang, Guo and Niu, 2019).
In this data gathered with sample survey or
even sampling as well.
2
Document Page
This leads to determine characteristics. It makes inferences related to population.
In the above scenario, to extract relationship among number of students attending class
and amount of chocolates bars sold. In this category, all students are population in which
categorised data is given is sample for reaching its objective.
b. Calculating standard deviation
Weekly attendance
Number of chocolates
bars sold X – Xbar (X – Xbar)^2
472 6916 -12.57 158.04
413 5884 -71.57 5122.47
503 7223 18.43 339.61
612 8158 127.43 16238.04
399 6014 -85.57 7322.47
538 7209 53.43 2854.61
455 6214 -29.57 874.47
Total 32909.71
Particulars Formula Outcome
mean 3392 / 7 484.57
n-1 7 – 1 6
Variance 32909.71 / 6 5484.95
Standard deviation Sqrt of 5484.95 74.06
c. Calculating Interquartile range of chocolate bar sold
Particulars Outcome
Quartile 1 6114
Quartile 3 7216
Interquartile range Q3 – Q1
Interquartile range 1102
3
Document Page
The Interquartile range is replicated as best measure of spread as compared to range is
not directly impacted through outliers. The standard deviation and variance is known as measure
of spread of data throughout the mean. Thus, IQR is highly preferred to standard deviation when
distribution is skewed highly along with severe outliers due to IQR is very less sensitive to its
features.
D. Calculating correlation coefficient
Correlation among variables shows alteration in variable with one value and other
variable directly tend for change in particular direction. The understanding of relationship is very
significant for application of value of one variable for purpose of predicting value to other
variable (Tian and Lindstedt, 2019). In the above scenario, correlation among weekly attendance
and chocolates bar sold is 0.97 which shows near to positive relationship. In simpler terms, it
could be elaborated that high scores are linked with other and change in attendance will give
direct impact on selling of chocolate bar.
QUESTION 3
A. Calculating and interpreting regression equation
Regression analysis helps in generating equation for describing statistical relationship
among one or more predictor variables along with response variable (Aneiros and et.al., 2019).
The slope could be interpreted as rise over run as it elaborates that how much is expected Y for
change as X raises. In general terms, units of slope is known as unit of Y variable per unit with
context to X variable. It is the ratio of alteration in Y per change in X.
Independent dependent
Weekly
attendance (X)
Number of
chocolates bars
sold (Y) x*y X^2 Y^2
472 6916 3264352 222784 47831056
413 5884 2430092 170569 34621456
503 7223 3633169 253009 52171729
612 8158 4992696 374544 66552964
399 6014 2399586 159201 36168196
538 7209 3878442 289444 51969681
455 6214 2827370 207025 38613796
4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Total 3392 47618 23425707 1676576 327928878
Linear regression equation Y = mx + b
M
[(Nxy sum) – (xsumysum)]/ [(Nx^2 sum) –
(xsum sx sum)]
Nxy sum 163979949
x sum y sum 161520256
Nx^2 sum 11736032
x sum x sum 11505664
M 10.68
B
[(X^2 sum Y sum) – (Xsum xy sum)] / [(Nx^2
sum) – (X sum Y sum)]
X^2 sum Y sum 79835195968
X sum XY sum 79459998144
Nx^2 sum 11736032
x sum x sum 11505664
B 1628.69
Y 10.67 X + 1628.69
The Y intercept is considered as place where this regression line crosses y axis as b and
X is denoted as 0. Apart from this, uncertainty directly differs through slope and of straight line
equation, slope is number m which is multiplied on x and b is known as y intercept. However,
useful aspect of equation is sensibly replicated as slope intercept form. On basis of its outcome,
m is 10.76 which tells that every increase of 1 in input variable(i.e. Every increase), value of my
output variable y would raise by 10.67.
5
Document Page
b. Calculating and interpreting coefficient of determination
Coefficient of determination is known as measure of amount of variance in the dependent
variable is elaborated through the independent variable (Zhan, 2019). Generally, it is denoted by
R square as key output of regression and could be interpreted as variance proportion in the
particular dependent variable could be predictable through independent variable. It is percentage
of variation in y explained through each variable x collaboratively.
Y 10.67 X + 1628.69
R^2
(NXY sum – X sum Y sum)^2/ [(Nx^2 sum –
X sum X sum)(Ny^2 sum – Y sum Y sum)
y sum y sum 2267473924
Ny^2 sum 2295502146
R^2 0.937
It is extracted as R^2 as it is closest to 1 so it could be elaborated that it is better the line
fits for the data. This is very important tool for identifying degree of linear correlation of
variables as goodness of fit in regression analysis. It is about 93% of variation in the y data is
because of differences in x data.
QUESTION 4
a. Probability of Holmes or receiving Grass roots training
Holmes or receiving grass roots training
Probability (Holmes) 127 / 193 0.66
Probability (Grass roots training) 104 / 193 0.54
Probability (Holmes and receiving grass
roots training)
Probability (Holmes) *
Probability (Grass roots
training) 0.35
Probability (Holmes or receiving grass roots
training)
Probability (Holmes) +
Probability (Grass roots
training) – Probability
(Holmes and Grass root
trainings) 0.84
6
Document Page
b. Probability of External and scientific training
External And scientific training
Probability (External) 66 / 193 0.34
Probability (Scientific training) 89 / 193 0.46
Probability (External And scientific
training)
Probability (External) *
Probability (Scientific
training) 0.16
c. Conditional probability of player in Holmes with scientific training
Probability (Scientific training and Homes) 0.30
Probability (Scientific training) 0.46
Probability (Holmes) 127 / 193 0.66
Probability (Scientific training/ Holmes)
Probability (Scientific
training and Homes)
Probability (Scientific
training) 0.66
d. Assessing that training is independent from recruiter
probability (training and recruitment) 0.16
Probability (training) 89 / 193 0.46
Probability (recruitment) 66 / 193 0.34
Probability (recruitment intersection
training) 0.34
Probability (training) Probability
(Recruitment/training) 0.16
Probability (training and recruitment) =
Probability (training)
Probability
(Recruitment/training)
7
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
QUESTION 5
A. Probability consumer comes from segment A if it is known that this consumer prefers Product
X over Product Y and Product Z
Probability tress are considered as natural aspect for purpose of describing events which
has presence of multiple outcomes and could be implied when events do follow each other. In
simple words, they leads for understanding association among the situation.
P(X/YZ) = P(X and YZ)/ P(YZ)
P(YZ) 0.44
P(X) 0.11
P(X and YZ) 0.0484
0.11
8
Document Page
B Probability that random consumer's first preference is product X
Segment A 0.11
Segment B 0.105
Segment C 0.06
Segment D 0.045
Overall probability of preferring product A 0.32
QUESTION 6
A. Probability that 2 or less of those 8 people would not purchase anything
x 2
probability 0.1
1 in 10 make purchase
n 8
n!/ (n-x)!*X!
8!/(8-2)!*2!
8! 40320
6! 720
2! 2
8!/(8-2)!*2! 28
p(buying) 0.25
P(failure) 0.75
P^x 0.25^8
0.00002
q^(n-x) 0.75^(8-2)
9
Document Page
0.17798
Final outcome 28*0.00002*0.17798
0.00008
QUESTION 7
Given
Average 1100000
Standard deviation 385000
Normal distribution
X 2000000
Z (X – u) / Standard deviation
(2000000-1100000)/385000
2.33
A. Assuming a normal distribution, assess probability that apartment would sell over $2 million
Probability= 0.0097
10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
B Probability that apartment would sell over $1 million but less than $1.1 million
Probability= 0.1025
QUESTION 8
A. Z test could be implied or not
This test lays special emphasis on single parameters and treat all unknown parameters as
fixed at their true values. As the data must be simple random sample from the population of
interest and population must be at least 10 times large as its sample. In the present scenario, there
are numerous statistical tests which are on basis of assumptions of normality so absence of
normally distributed data instils a lot of fear. This has been recommended that there must be
application of non parametric version of test which has absence of assumption of normality.
Apart from this, it is very significant for not being sensitive to normality and it might still run
with absence of normal distribution. Thus, if data is not distributed normally, but z test could be
implied if sample size is large enough as in above study, this criteria has been met. In this aspect,
the sample size of at least 30 is sufficient and in this, 50 is sample size so it will give
approximation outcome.
B. Probability that 30% investors will be willing for commit $1 million or more
Here,
P 0.24
Q 0.76
P^ 0.3
11
Document Page
N 45
Sep^ SQRT(pq/n)
p*q 0.1824
Sep^ 0.064
Entering the above values in yield of z formula as:
z (P^ - p) / SEp^
(0.3 – 0.24) / 0.0636
0.87
The above outcome of z formula gives probability with Z distribution table as 0.3078 which is
referred as area among sample proportion 0.3 with population proportion of 0.24. The solution
for this problem is stated below:
P^0.3 0.5000 – 0.3078
0.1922
12
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]