Data Analysis and Forecasting
VerifiedAdded on 2023/01/07
|12
|1864
|37
AI Summary
This project assignment focuses on data analysis and forecasting techniques. It covers methods like mean, median, range, and standard deviation. The assignment also explores linear regression and its application in predicting phone calls on specific days. The limitations of linear regression are discussed as well.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Analysis and Forecasting
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Table of Contents
1. Arrange the data in table format..................................................................................................3
2. Present the data in column and bar chart.....................................................................................3
3. Calculation of mean, median, range and standard deviation.......................................................4
i. Mean.........................................................................................................................................4
ii. Median.....................................................................................................................................5
iii. Mode.......................................................................................................................................5
iv. Range......................................................................................................................................6
v. Standard deviation...................................................................................................................6
4. Calculation based on y = mx + c.................................................................................................7
i. Calculation of m.......................................................................................................................8
ii. Calculation of c........................................................................................................................8
iii. Calculation for 12th and 14th day number of calls...................................................................8
References......................................................................................................................................10
1. Arrange the data in table format..................................................................................................3
2. Present the data in column and bar chart.....................................................................................3
3. Calculation of mean, median, range and standard deviation.......................................................4
i. Mean.........................................................................................................................................4
ii. Median.....................................................................................................................................5
iii. Mode.......................................................................................................................................5
iv. Range......................................................................................................................................6
v. Standard deviation...................................................................................................................6
4. Calculation based on y = mx + c.................................................................................................7
i. Calculation of m.......................................................................................................................8
ii. Calculation of c........................................................................................................................8
iii. Calculation for 12th and 14th day number of calls...................................................................8
References......................................................................................................................................10
Introduction
The given project assignment contains data analyses of random phone calls over 10 consecutive
days. These phone calls data has been analyzed based on various methods such as mean, median,
mode, range and standard deviation. The regression model has been used to predict the phone
calls on 12th and 14th day.
1. Arrange the data in table format
1 Phone calls
Day
s Phone calls
1 80
2 81
3 75
4 65
5 90
6 82
7 79
8 76
9 81
10 83
2. Present the data in column and bar chart
Column chart:
The given project assignment contains data analyses of random phone calls over 10 consecutive
days. These phone calls data has been analyzed based on various methods such as mean, median,
mode, range and standard deviation. The regression model has been used to predict the phone
calls on 12th and 14th day.
1. Arrange the data in table format
1 Phone calls
Day
s Phone calls
1 80
2 81
3 75
4 65
5 90
6 82
7 79
8 76
9 81
10 83
2. Present the data in column and bar chart
Column chart:
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Phone calls
Phone calls
Line chart:
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Phone calls
Phone calls
3. Calculation of mean, median, range and standard deviation
i. Mean
Definition: There are different types of means in mathematics, especially in statistics. For the
information index, the arithmetic mean, also known as expected value or average, is the central
value of a particular arrangement of numbers: obviously, the sum of the values divided by the
number of values (Daraganova, Edwards and Sipthorp, 2013).
Days Phone calls
0
10
20
30
40
50
60
70
80
90
100
Phone calls
Phone calls
Line chart:
1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
90
100
Phone calls
Phone calls
3. Calculation of mean, median, range and standard deviation
i. Mean
Definition: There are different types of means in mathematics, especially in statistics. For the
information index, the arithmetic mean, also known as expected value or average, is the central
value of a particular arrangement of numbers: obviously, the sum of the values divided by the
number of values (Daraganova, Edwards and Sipthorp, 2013).
Days Phone calls
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
1 80
2 81
3 75
4 65
5 90
6 82
7 79
8 76
9 81
10 83
Ʃ = 792
μ= Ʃx
N
μ= 80+81+ 75+65+90+82+79+76+81+83
10
μ= 792
10 = 79.2 phone calls
The result shows the average phone calls made within 10 days are approximate 15 phone calls.
ii. Median
Definition: It is the mid part of any data set; the median is the middle number in a sorted,
ascending or descending, list of numbers and can be more descriptive of that data set than the
average (Groves, Mousley and Forgasz, 2006).
S.No
. Days
Phone
calls
1 4 65
2 3 75
3 8 76
4 7 79
5 1 80
6 2 81
7 9 81
8 6 82
9 10 83
10 5 90
2 81
3 75
4 65
5 90
6 82
7 79
8 76
9 81
10 83
Ʃ = 792
μ= Ʃx
N
μ= 80+81+ 75+65+90+82+79+76+81+83
10
μ= 792
10 = 79.2 phone calls
The result shows the average phone calls made within 10 days are approximate 15 phone calls.
ii. Median
Definition: It is the mid part of any data set; the median is the middle number in a sorted,
ascending or descending, list of numbers and can be more descriptive of that data set than the
average (Groves, Mousley and Forgasz, 2006).
S.No
. Days
Phone
calls
1 4 65
2 3 75
3 8 76
4 7 79
5 1 80
6 2 81
7 9 81
8 6 82
9 10 83
10 5 90
Median = {(n + 1) ÷ 2}th value
Median = { (10 + 1) /2 }
Median = 5.5
Average =
(middle value before + middle value after)
÷ 2
(5th + 6th value) /2
(80+81)/2 = 80.5
Median = 80.5
65,75,76,79,80,81,81,82,83,90
The median position = N +1
2 =10+1
2 = 11
2 =¿5.5th Position
(5.5 means between the position of number 5 and number 6)
Hence, there are two values of phone calls between number 5 and 6. So, the median will be:
Median = 5th +6 th Posi tion
2 = 80+81
2 = 16 1
2 =¿ 80.5
iii. Mode
Definition: The mode is the number that appears as often as possible in an information index. A
mode can have multiple numbers, more than one mode, or no mode. Other traditional
proportions of focus deviation include the mean or normal (mean) set, and the center, the center
is the stimulus in a set. In visions, information is circulated in different ways (Kenny, Kashy and
Cook, 2006).
Range (Calls) Frequency
0 - 10 0
10 - 20 0
20 -30 0
30 - 40 0
40 - 50 0
50 - 60 0
60 - 70 1
70 - 80 4
80 - 90 5
Median = { (10 + 1) /2 }
Median = 5.5
Average =
(middle value before + middle value after)
÷ 2
(5th + 6th value) /2
(80+81)/2 = 80.5
Median = 80.5
65,75,76,79,80,81,81,82,83,90
The median position = N +1
2 =10+1
2 = 11
2 =¿5.5th Position
(5.5 means between the position of number 5 and number 6)
Hence, there are two values of phone calls between number 5 and 6. So, the median will be:
Median = 5th +6 th Posi tion
2 = 80+81
2 = 16 1
2 =¿ 80.5
iii. Mode
Definition: The mode is the number that appears as often as possible in an information index. A
mode can have multiple numbers, more than one mode, or no mode. Other traditional
proportions of focus deviation include the mean or normal (mean) set, and the center, the center
is the stimulus in a set. In visions, information is circulated in different ways (Kenny, Kashy and
Cook, 2006).
Range (Calls) Frequency
0 - 10 0
10 - 20 0
20 -30 0
30 - 40 0
40 - 50 0
50 - 60 0
60 - 70 1
70 - 80 4
80 - 90 5
Mode = L + (fm−f1)h /(fm−f1)+(fm−f2)
L = Lower limit Mode of modal class
fm = Frequency of modal class
f1 = Frequency of class preceding the modal class
f2= Frequency of class succeeding the modal class
h = Size of class interval
Mode = 80 + {(5 - 4)10 / (5 - 4) + (5 - 0)}
Mode =
81.666666
7
iv. Range
It is a measure of distribution that is easily understood and known. It is defined as such Range =
Largest Observation - Smallest Observation (Agresti, 2003).
Range = maximum value – minimum
value
Maximum Value = 90
Minimum value = 65
Range = (90 - 65) = 15
v. Standard deviation
Definition: The standard deviation is a statistic that measures the distribution of comparative
data by its meaning and is determined as the square root of the variance. The general deviation is
determined as the square base of the difference by determining the movement of each
information directly from the center. Given that the targeted information may not be further than
normal, there is a greater bias in the collection of information; in this way, the more information
is disseminated, the greater the general bias (Kenny, Kashy and Cook, 2006).
N X μ (X -μ ) (X -μ )2
1 80 79.2 0.8 0.64
2 81 79.2 1.8 3.24
3 75 79.2 -4.2 17.64
4 65 79.2 -14.2 201.64
5 90 79.2 10.8 116.64
6 82 79.2 2.8 7.84
L = Lower limit Mode of modal class
fm = Frequency of modal class
f1 = Frequency of class preceding the modal class
f2= Frequency of class succeeding the modal class
h = Size of class interval
Mode = 80 + {(5 - 4)10 / (5 - 4) + (5 - 0)}
Mode =
81.666666
7
iv. Range
It is a measure of distribution that is easily understood and known. It is defined as such Range =
Largest Observation - Smallest Observation (Agresti, 2003).
Range = maximum value – minimum
value
Maximum Value = 90
Minimum value = 65
Range = (90 - 65) = 15
v. Standard deviation
Definition: The standard deviation is a statistic that measures the distribution of comparative
data by its meaning and is determined as the square root of the variance. The general deviation is
determined as the square base of the difference by determining the movement of each
information directly from the center. Given that the targeted information may not be further than
normal, there is a greater bias in the collection of information; in this way, the more information
is disseminated, the greater the general bias (Kenny, Kashy and Cook, 2006).
N X μ (X -μ ) (X -μ )2
1 80 79.2 0.8 0.64
2 81 79.2 1.8 3.24
3 75 79.2 -4.2 17.64
4 65 79.2 -14.2 201.64
5 90 79.2 10.8 116.64
6 82 79.2 2.8 7.84
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
7 79 79.2 -0.2 0.04
8 76 79.2 -3.2 10.24
9 81 79.2 1.8 3.24
10 83 79.2 3.8 14.44
792 375.6
Mean =x̄ 79.2
Standard deviation
= 6.1286
σ = √ ∑ ¿ ¿ ¿ ¿
Standard deviation = 6.128 phone calls
4. Calculation based on y = mx + c
Linear forecasting model
Linear regression can be used in both types of forecasting methods. Due to causal mechanisms,
the causal model may involve direct reproduction with only a few pictorial factors. This strategy
is valuable when there is no part time (Gelman and et.al., 2013).
Linear regression is a measurable tool that is used to help anticipate future characteristics from
past attributes. It is usually used as a measurement method to determine the hidden pattern and
when costs are overestimated. The direct retransmission movement uses the least squares
strategy to draw a straight line through costs to narrow the gap between costs and future
movement. This direct repeater signal designs the moving average for each information point
(Hair and et.al., 1998).
y = mx + c
y = how far up
8 76 79.2 -3.2 10.24
9 81 79.2 1.8 3.24
10 83 79.2 3.8 14.44
792 375.6
Mean =x̄ 79.2
Standard deviation
= 6.1286
σ = √ ∑ ¿ ¿ ¿ ¿
Standard deviation = 6.128 phone calls
4. Calculation based on y = mx + c
Linear forecasting model
Linear regression can be used in both types of forecasting methods. Due to causal mechanisms,
the causal model may involve direct reproduction with only a few pictorial factors. This strategy
is valuable when there is no part time (Gelman and et.al., 2013).
Linear regression is a measurable tool that is used to help anticipate future characteristics from
past attributes. It is usually used as a measurement method to determine the hidden pattern and
when costs are overestimated. The direct retransmission movement uses the least squares
strategy to draw a straight line through costs to narrow the gap between costs and future
movement. This direct repeater signal designs the moving average for each information point
(Hair and et.al., 1998).
y = mx + c
y = how far up
x = how far along
m = Slope or Gradient (how steep the line is)
c = value of y when x=0
i. Calculation of m
m = SP / SSX = 33/82.5 = 0.4
X - Mx Y - My
(X -
Mx)2
(X - Mx)(Y -
My)
-4.5 0.8 20.25 -3.6
-3.5 1.8 12.25 -6.3
-2.5 -4.2 6.25 10.5
-1.5 -14.2 2.25 21.3
-0.5 10.8 0.25 -5.4
0.5 2.8 0.25 1.4
1.5 -0.2 2.25 -0.3
2.5 -3.2 6.25 -8
3.5 1.8 12.25 6.3
4.5 3.8 20.25 17.1
SS: 82.5 SP: 33
ii. Calculation of c
Sum of X = 55
Sum of Y = 792
Mean X = 5.5
Mean Y = 79.2
Sum of squares (SSX) = 82.5
Sum of products (SP) = 33
Regression Equation = ŷ = bX + a
b = SP/SSX = 33/82.5 = 0.4
a = MY - bMX = 79.2 - (0.4*5.5) = 77
ŷ = 0.4X + 77
m = Slope or Gradient (how steep the line is)
c = value of y when x=0
i. Calculation of m
m = SP / SSX = 33/82.5 = 0.4
X - Mx Y - My
(X -
Mx)2
(X - Mx)(Y -
My)
-4.5 0.8 20.25 -3.6
-3.5 1.8 12.25 -6.3
-2.5 -4.2 6.25 10.5
-1.5 -14.2 2.25 21.3
-0.5 10.8 0.25 -5.4
0.5 2.8 0.25 1.4
1.5 -0.2 2.25 -0.3
2.5 -3.2 6.25 -8
3.5 1.8 12.25 6.3
4.5 3.8 20.25 17.1
SS: 82.5 SP: 33
ii. Calculation of c
Sum of X = 55
Sum of Y = 792
Mean X = 5.5
Mean Y = 79.2
Sum of squares (SSX) = 82.5
Sum of products (SP) = 33
Regression Equation = ŷ = bX + a
b = SP/SSX = 33/82.5 = 0.4
a = MY - bMX = 79.2 - (0.4*5.5) = 77
ŷ = 0.4X + 77
iii. Calculation for 12th and 14th day number of calls
X = 12
Y = mx + c
Y = 0.4X + 77
Y = 0.4(12) + 77
= 81.8 or 82 calls
X = 14
Y = mx + c
Y = 0.4X + 77
Y = 0.4(14) + 77
= 82.6 or 83 calls
Problem
The main limitation of linear regression is the acceptance of a sequence between the variable in
need and the autonomic factors. Of course, the information from time to time is just as
accessible. It is expected that there will be a linear correlation between dependent and free
factors which is generally incorrect.
If the number of observations is lesser than the number of features, Linear Regression should not
be used; otherwise it may put too much pressure on as you start to think about the hype in this
situation during model building. Direct withdrawal is an amazing tool for breaking down the
links between factors, but it is not recommended for the most rational claims because it solves
too many of the real problems expected of a direct relationship between the factors.
Conclusion
On the basis of above analysis it can be concluded that; different methods shows different result.
There application is dependent on the requirement of information required. Linear regression
model has certain limitations which can be eliminated through application of multi regression
model and moving average method to identify the pattern of growth between variables.
X = 12
Y = mx + c
Y = 0.4X + 77
Y = 0.4(12) + 77
= 81.8 or 82 calls
X = 14
Y = mx + c
Y = 0.4X + 77
Y = 0.4(14) + 77
= 82.6 or 83 calls
Problem
The main limitation of linear regression is the acceptance of a sequence between the variable in
need and the autonomic factors. Of course, the information from time to time is just as
accessible. It is expected that there will be a linear correlation between dependent and free
factors which is generally incorrect.
If the number of observations is lesser than the number of features, Linear Regression should not
be used; otherwise it may put too much pressure on as you start to think about the hype in this
situation during model building. Direct withdrawal is an amazing tool for breaking down the
links between factors, but it is not recommended for the most rational claims because it solves
too many of the real problems expected of a direct relationship between the factors.
Conclusion
On the basis of above analysis it can be concluded that; different methods shows different result.
There application is dependent on the requirement of information required. Linear regression
model has certain limitations which can be eliminated through application of multi regression
model and moving average method to identify the pattern of growth between variables.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
References
Agresti, A., 2003. Categorical data analysis (Vol. 482). John Wiley & Sons.
Daraganova, G., Edwards, B. and Sipthorp, M., 2013. Using National Assessment Program
Literacy and Numeracy (NAPLAN) Data in the Longitudinal Study of Australian Children
(LSAC). Department of Families, Housing, Community Services and Indigenous Affairs.
Gelman, A., and et.al., 2013. Bayesian data analysis. CRC press.
Groves, S., Mousley, J. and Forgasz, H., 2006. A primary numeracy: a mapping review and
analysis of Australian research in numeracy learning at the primary school level: report.
Centre for Studies in Mathematics, Science and Environmental Education, Deakin
University.
Hair, J.F., and et.al., 1998. Multivariate data analysis (Vol. 5, No. 3, pp. 207-219). Upper Saddle
River, NJ: Prentice hall.
Kenny, D.A., Kashy, D.A. and Cook, W.L., 2006. Dyadic data analysis. Guilford press.
Agresti, A., 2003. Categorical data analysis (Vol. 482). John Wiley & Sons.
Daraganova, G., Edwards, B. and Sipthorp, M., 2013. Using National Assessment Program
Literacy and Numeracy (NAPLAN) Data in the Longitudinal Study of Australian Children
(LSAC). Department of Families, Housing, Community Services and Indigenous Affairs.
Gelman, A., and et.al., 2013. Bayesian data analysis. CRC press.
Groves, S., Mousley, J. and Forgasz, H., 2006. A primary numeracy: a mapping review and
analysis of Australian research in numeracy learning at the primary school level: report.
Centre for Studies in Mathematics, Science and Environmental Education, Deakin
University.
Hair, J.F., and et.al., 1998. Multivariate data analysis (Vol. 5, No. 3, pp. 207-219). Upper Saddle
River, NJ: Prentice hall.
Kenny, D.A., Kashy, D.A. and Cook, W.L., 2006. Dyadic data analysis. Guilford press.
1 out of 12
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.