Notes on Simple Linear Regression
VerifiedAdded on 2023/06/08
|8
|3145
|461
AI Summary
Supplemental notes on simple linear regression for the basic least squares estimation problem, moments of the estimators, and the fundamental optimality property of these estimators.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
ESE 302 Tony E. Smith
NOTES ON SIMPLE LINEAR REGRESSION
1. INTRODUCTION
The purpose of these notes is to supplement the mathematical development of linear
regression in Devore (2008). This development also draws on the treatment in Johnston (1963)
and Larsen and Marx (1986). We begin with the basic least squares estimation problem, and
next develop the moments of the estimators. Finally the fundamental optimality property of
these estimators is established in terms of the Gauss-Markov Theorem.
2. LINEAR LEAST SQUARES ESTIMATION
The basic linear model assumes the existence of a linear relationship between two
variables, andx y , which is disturbed by some random error, . Hence for each value of x the
corresponding y -value is a random variable of the form
(2.1) 0 1Y x
where 0 and 1 are designated, respectively, as the intercept parameter and the slope parameter
of the linear function, 0 1x . If n values ( : 1,.., )ix i n of x are observed, with
corresponding errors ( : 1,.., )i i n , then the resulting random variables, ( : 1,.., )iY i n , are given
by
(2.2) 0 1i i iY x , i = 1,…,n
In this context it is assumed that the random errors, ( : 1,.., )i i n , are independently and
identically distributed ( iid ) with mean zero and variance 2
, so that
(2.3) E( ) 0i , i = 1,…,n
(2.4) var 2
( )i , i = 1,…,n
If values of y corresponding to ( : 1,.., )ix i n are also observed, and are denoted by
( : 1,..., )iy i n , then the least squares estimation problem is to find estimates, 0
ˆ and 1
ˆ , of the
unknown parameter values, 0 and 1 , which minimize the sum of squared residuals
[designated as 0 1( , )f b b in Devore , p. 455]:
NOTES ON SIMPLE LINEAR REGRESSION
1. INTRODUCTION
The purpose of these notes is to supplement the mathematical development of linear
regression in Devore (2008). This development also draws on the treatment in Johnston (1963)
and Larsen and Marx (1986). We begin with the basic least squares estimation problem, and
next develop the moments of the estimators. Finally the fundamental optimality property of
these estimators is established in terms of the Gauss-Markov Theorem.
2. LINEAR LEAST SQUARES ESTIMATION
The basic linear model assumes the existence of a linear relationship between two
variables, andx y , which is disturbed by some random error, . Hence for each value of x the
corresponding y -value is a random variable of the form
(2.1) 0 1Y x
where 0 and 1 are designated, respectively, as the intercept parameter and the slope parameter
of the linear function, 0 1x . If n values ( : 1,.., )ix i n of x are observed, with
corresponding errors ( : 1,.., )i i n , then the resulting random variables, ( : 1,.., )iY i n , are given
by
(2.2) 0 1i i iY x , i = 1,…,n
In this context it is assumed that the random errors, ( : 1,.., )i i n , are independently and
identically distributed ( iid ) with mean zero and variance 2
, so that
(2.3) E( ) 0i , i = 1,…,n
(2.4) var 2
( )i , i = 1,…,n
If values of y corresponding to ( : 1,.., )ix i n are also observed, and are denoted by
( : 1,..., )iy i n , then the least squares estimation problem is to find estimates, 0
ˆ and 1
ˆ , of the
unknown parameter values, 0 and 1 , which minimize the sum of squared residuals
[designated as 0 1( , )f b b in Devore , p. 455]:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
(2.5) 2
0 1 0 11
( , ) ( )
n
i ii
S y x
This function is easily seen to be convex and differentiable in 0 and 1 , so that the
unique solution ( 0
ˆ , 1
ˆ) is given by the first-order conditions:
(2.6) 0 1 0 1
0
ˆ ˆ ˆ ˆ0 ( , ) 2 ( )( 1)i ii
S y x
(2.7) 0 1 0 1
1
ˆ ˆ ˆ ˆ0 ( , ) 2 ( )( )i i ii
S y x x
If we let 1
i i
x x
n
and 1
i i
y y
n
, then by (2.6)
(2.8) 0 1 0 1
1 1ˆ ˆ ˆ ˆ0 0i i i ii i i i
y n x y x
n n
0 1
ˆ ˆ 0y x
0 1
ˆ ˆy x
and by (2.7)
(2.9) 0 1
ˆ ˆ 0i i ii y x x
To simplify (2.9) let the estimated y -value corresponding to ( 0
ˆ , 1
ˆ) be defined by
(2.10) 0 1
ˆ ˆˆi iy x , i = 1,…,n
and rewrite (2.9) as
(2.11) ˆ 0i i ii y y x
Note also from (2.8) that
(2.12) 0 1
ˆ ˆˆ ˆi i i i ii i i i
y y y y x ny
0 1 0 1
ˆ ˆ ˆ ˆ
ii
n x ny n x ny
0
0 1 0 11
( , ) ( )
n
i ii
S y x
This function is easily seen to be convex and differentiable in 0 and 1 , so that the
unique solution ( 0
ˆ , 1
ˆ) is given by the first-order conditions:
(2.6) 0 1 0 1
0
ˆ ˆ ˆ ˆ0 ( , ) 2 ( )( 1)i ii
S y x
(2.7) 0 1 0 1
1
ˆ ˆ ˆ ˆ0 ( , ) 2 ( )( )i i ii
S y x x
If we let 1
i i
x x
n
and 1
i i
y y
n
, then by (2.6)
(2.8) 0 1 0 1
1 1ˆ ˆ ˆ ˆ0 0i i i ii i i i
y n x y x
n n
0 1
ˆ ˆ 0y x
0 1
ˆ ˆy x
and by (2.7)
(2.9) 0 1
ˆ ˆ 0i i ii y x x
To simplify (2.9) let the estimated y -value corresponding to ( 0
ˆ , 1
ˆ) be defined by
(2.10) 0 1
ˆ ˆˆi iy x , i = 1,…,n
and rewrite (2.9) as
(2.11) ˆ 0i i ii y y x
Note also from (2.8) that
(2.12) 0 1
ˆ ˆˆ ˆi i i i ii i i i
y y y y x ny
0 1 0 1
ˆ ˆ ˆ ˆ
ii
n x ny n x ny
0
To solve for 1
ˆ we first observe by subtracting (2.8) from (2.10) that
(2.13) 1
ˆˆi iy y x x
1
ˆˆi i i iy y y y x x , i = 1,…,n
Hence, multiplying both sides by ix x and summing over i , we obtain
(2.14) 2
1
ˆˆi i i i i ii i i
y y x x y y x x x x
But since (2.11) and (2.12) imply
(2.15) ˆ ˆ ˆ 0i i i i i i i ii i i
y y x x y y x x y y
we may conclude from (2.14) that
(2.16)
1 2
ˆ i ii
ii
y y x x
x x
[See expression (12.2) in Devore, p. 456.] Finally, by employing (2.8) , we may solve for 0
ˆ in
terms of 1
ˆ as
(2.17) 0 1
ˆ ˆy x
[See expression (12.3) in Devore, p. 456.]
3. MOMENTS OF THE ESTIMATORS
The estimators in (2.16) and (2.17) depend on the values of the random variables, ( : 1,.., )iY i n ,
and hence are themselves random variables . In particular, if the sample mean of the iY ’s is
denoted by
(3.1) 0 1 0 1
1 1 1
i i i ii i i
Y Y x x
n n n
,
then it follows at once from (2.16) that 1
ˆ is a random variable of the form
ˆ we first observe by subtracting (2.8) from (2.10) that
(2.13) 1
ˆˆi iy y x x
1
ˆˆi i i iy y y y x x , i = 1,…,n
Hence, multiplying both sides by ix x and summing over i , we obtain
(2.14) 2
1
ˆˆi i i i i ii i i
y y x x y y x x x x
But since (2.11) and (2.12) imply
(2.15) ˆ ˆ ˆ 0i i i i i i i ii i i
y y x x y y x x y y
we may conclude from (2.14) that
(2.16)
1 2
ˆ i ii
ii
y y x x
x x
[See expression (12.2) in Devore, p. 456.] Finally, by employing (2.8) , we may solve for 0
ˆ in
terms of 1
ˆ as
(2.17) 0 1
ˆ ˆy x
[See expression (12.3) in Devore, p. 456.]
3. MOMENTS OF THE ESTIMATORS
The estimators in (2.16) and (2.17) depend on the values of the random variables, ( : 1,.., )iY i n ,
and hence are themselves random variables . In particular, if the sample mean of the iY ’s is
denoted by
(3.1) 0 1 0 1
1 1 1
i i i ii i i
Y Y x x
n n n
,
then it follows at once from (2.16) that 1
ˆ is a random variable of the form
(3.2)
1 2
ˆ i ii
ii
Y Y x x
x x
and, similarly, that 0
ˆ is a random variable of the form
(3.3) 0 1
ˆ ˆY x
To compute the moments of the slope estimator, 1
ˆ , it is convenient to simplify
expression (3.2) as follows. By breaking (3.2) into two terms
(3.4)
1 2 2
ˆ i i ii i
i ii i
Y x x Y x x
x x x x
and observing that
(3.5) 1 0i i ii i i
x x x nx n x x
n
we see that the second term vanishes, and hence that the estimator 1
ˆ can be written as a linear
combination of the iY ’s
(3.6) 1
ˆ i ii wY
where the coefficients iw are of the form
(3.7)
2
i
i
jj
x x
w
x x
, i = 1,…,n
and hence are non-random (i.e., depend only on the given values of the ix ’s) . To analyze (3.6)
we begin with several observations about the coefficient values in (3.7) . First observe from
(3.5) that
(3.8)
2 0
ii
ii
ii
x x
w x x
and moreover that
(3.9)
2
2 1
ii
i ii
ii
x x
w x x x x
1 2
ˆ i ii
ii
Y Y x x
x x
and, similarly, that 0
ˆ is a random variable of the form
(3.3) 0 1
ˆ ˆY x
To compute the moments of the slope estimator, 1
ˆ , it is convenient to simplify
expression (3.2) as follows. By breaking (3.2) into two terms
(3.4)
1 2 2
ˆ i i ii i
i ii i
Y x x Y x x
x x x x
and observing that
(3.5) 1 0i i ii i i
x x x nx n x x
n
we see that the second term vanishes, and hence that the estimator 1
ˆ can be written as a linear
combination of the iY ’s
(3.6) 1
ˆ i ii wY
where the coefficients iw are of the form
(3.7)
2
i
i
jj
x x
w
x x
, i = 1,…,n
and hence are non-random (i.e., depend only on the given values of the ix ’s) . To analyze (3.6)
we begin with several observations about the coefficient values in (3.7) . First observe from
(3.5) that
(3.8)
2 0
ii
ii
ii
x x
w x x
and moreover that
(3.9)
2
2 1
ii
i ii
ii
x x
w x x x x
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
which together with (3.8) also implies
(3.10) 1i i i i i i ii i i i
w x w x x w w x x
To compute the mean of 1
ˆ , observe from (2.2) and (2.3) that
(3.11) 0 1 0 1E( ) E( )i i i iY x x
so that by (3.6) , together with (3.8) and (3.10) ,
(3.12) 1 0 1
ˆE( ) E( ) =i i i ii i
w Y w x
0 1 1(0) (1)i i ii i
w w x
1
Thus 1
ˆ is an unbiased estimator of 1. Moreover, since (3.1) and (2.3) imply that
(3.13) 0 1 0 1
1
E( ) E( )ii
Y x x
n
it follows from (3.3) together with (3.13) that
(3.14) 0 1 0 1 1 0
ˆ ˆE( ) = E( ) E( )Y x x x
and thus that 0
ˆ is also an unbiased estimator of 0
To compute the variance of 1
ˆ, we again observe from (3.6) that
(3.15) 1 0 1 0 1
ˆ ( ) ( )ii i i i i ii i i
w x w x w
i ii
const w
and hence (from the independence of the i ’s that
(3.16) 2
1
ˆvar( ) var var( )i i i ii i
w w
Hence we may conclude from (2.4) and (3.7) that
(3.17)
2
2 2 2
1 2
ˆvar( ) =i
i
i i
jj
x x
w
x x
(3.10) 1i i i i i i ii i i i
w x w x x w w x x
To compute the mean of 1
ˆ , observe from (2.2) and (2.3) that
(3.11) 0 1 0 1E( ) E( )i i i iY x x
so that by (3.6) , together with (3.8) and (3.10) ,
(3.12) 1 0 1
ˆE( ) E( ) =i i i ii i
w Y w x
0 1 1(0) (1)i i ii i
w w x
1
Thus 1
ˆ is an unbiased estimator of 1. Moreover, since (3.1) and (2.3) imply that
(3.13) 0 1 0 1
1
E( ) E( )ii
Y x x
n
it follows from (3.3) together with (3.13) that
(3.14) 0 1 0 1 1 0
ˆ ˆE( ) = E( ) E( )Y x x x
and thus that 0
ˆ is also an unbiased estimator of 0
To compute the variance of 1
ˆ, we again observe from (3.6) that
(3.15) 1 0 1 0 1
ˆ ( ) ( )ii i i i i ii i i
w x w x w
i ii
const w
and hence (from the independence of the i ’s that
(3.16) 2
1
ˆvar( ) var var( )i i i ii i
w w
Hence we may conclude from (2.4) and (3.7) that
(3.17)
2
2 2 2
1 2
ˆvar( ) =i
i
i i
jj
x x
w
x x
2 2
2
2 22
[ ]
ii
ij ij
x x
x xx x
[See expression (12.4) in Devore, p. 470.] Similarly, to determine the variance of 0
ˆ ,we observe
from the above relations that
(3.18) 0 1
1 1ˆ ˆ ( )i i i i ii i in nY x Y x wY xw Y
0 1
1( ) ( )i i ii n xw x
0 1
1 1( ) ( ) ( )i i i ii in nxw x xw
1( )i ii nconst xw
and hence that
(3.19)
2
0
1ˆvar( ) = var( )i ii xw
n
2
2 2 2 2
2
1 1 2
i i ii i
xw xw x w
n n n
2 2 2
2
1 2 i ii i
x w x w
n n
2 22
2 2
2 2
( )1 (0) =
( ) ( )
ii
i ii i
x x nxx
n x x n x x
2 2 2 2 2
2 2
2 2
2 2 2 2
=
( ) ( )
i i ii i i
i ii i
x x x nx x nx nx
n x x n x x
2
2
2
( )
ii
ii
x
n x x
4. GAUSS MARKOV THEOREM
Finally we establish the fundamental optimality property of the above estimators. To
do so, recall that for an independent random sample 1( ,.., )nY Y from a population with mean,
E( )Y , the sample mean, nY , was shown to be a best linear unbiased (BLU) estimator of .
This optimality property turns out to be shared by the least-squares estimators ( 0
ˆ , 1
ˆ) above .
2 2
2
2 22
[ ]
ii
ij ij
x x
x xx x
[See expression (12.4) in Devore, p. 470.] Similarly, to determine the variance of 0
ˆ ,we observe
from the above relations that
(3.18) 0 1
1 1ˆ ˆ ( )i i i i ii i in nY x Y x wY xw Y
0 1
1( ) ( )i i ii n xw x
0 1
1 1( ) ( ) ( )i i i ii in nxw x xw
1( )i ii nconst xw
and hence that
(3.19)
2
0
1ˆvar( ) = var( )i ii xw
n
2
2 2 2 2
2
1 1 2
i i ii i
xw xw x w
n n n
2 2 2
2
1 2 i ii i
x w x w
n n
2 22
2 2
2 2
( )1 (0) =
( ) ( )
ii
i ii i
x x nxx
n x x n x x
2 2 2 2 2
2 2
2 2
2 2 2 2
=
( ) ( )
i i ii i i
i ii i
x x x nx x nx nx
n x x n x x
2
2
2
( )
ii
ii
x
n x x
4. GAUSS MARKOV THEOREM
Finally we establish the fundamental optimality property of the above estimators. To
do so, recall that for an independent random sample 1( ,.., )nY Y from a population with mean,
E( )Y , the sample mean, nY , was shown to be a best linear unbiased (BLU) estimator of .
This optimality property turns out to be shared by the least-squares estimators ( 0
ˆ , 1
ˆ) above .
This result, known as the Gauss-Markov Theorem, provides the single strongest justification for
linear least-squares estimation, and can be stated as follows:
GAUSS MARKOV THEOREM. For any linear function, 0 0 1 1,L a a of
( 0
ˆ , 1
ˆ), the least squares estimator, 0 0 1 1
ˆ ˆˆ ,L a a has minimum variance
among all linear unbiased estimators of L.
Proof: We shall prove this assertion only for the linear function with coefficients
0 1( 0, 1),a a i.e., for the estimate, 1
ˆ, of the slope parameter, 1 , (which is by far the most
important of the two individual parameters). The argument for any linear function of 0 and 1
is essentially the same. To begin with, observe from (3.6) that 1
ˆ is indeed a linear estimator,
i.e., is a linear function of the random variables ( iY : i = 1,…,n). Moreover, it was shown in (3.12)
that 1
ˆ is also an unbiased estimator of 1. Hence it remains only to show that the variance of
1
ˆ never exceeds that of any other linear unbiased estimator. To do so, consider any other linear
estimator, say
(4.1) 1 = i ii c Y
and suppose that 1 is also unbiased estimator . Then by (3.12) we must have
(4.2) 1 1= E( ) = E( )i ii c Y
0 1 0 1= ( )i i i i ii i i
c x c c x
But since unbiasedness requires that (4.2) hold for all values of the unknown parameters 0 and
1, it follows by setting 0 1 and 1 0 that
(4.3) 0ii c
and in turn, by setting 1 1 , that
(4.4) 1i ii c x
Hence, in a manner identical with (3.15), these two conditions are seen to imply that
(4.5) 1 1 1i i i ii i
c Y c
and thus that the variance of 1 is given by
linear least-squares estimation, and can be stated as follows:
GAUSS MARKOV THEOREM. For any linear function, 0 0 1 1,L a a of
( 0
ˆ , 1
ˆ), the least squares estimator, 0 0 1 1
ˆ ˆˆ ,L a a has minimum variance
among all linear unbiased estimators of L.
Proof: We shall prove this assertion only for the linear function with coefficients
0 1( 0, 1),a a i.e., for the estimate, 1
ˆ, of the slope parameter, 1 , (which is by far the most
important of the two individual parameters). The argument for any linear function of 0 and 1
is essentially the same. To begin with, observe from (3.6) that 1
ˆ is indeed a linear estimator,
i.e., is a linear function of the random variables ( iY : i = 1,…,n). Moreover, it was shown in (3.12)
that 1
ˆ is also an unbiased estimator of 1. Hence it remains only to show that the variance of
1
ˆ never exceeds that of any other linear unbiased estimator. To do so, consider any other linear
estimator, say
(4.1) 1 = i ii c Y
and suppose that 1 is also unbiased estimator . Then by (3.12) we must have
(4.2) 1 1= E( ) = E( )i ii c Y
0 1 0 1= ( )i i i i ii i i
c x c c x
But since unbiasedness requires that (4.2) hold for all values of the unknown parameters 0 and
1, it follows by setting 0 1 and 1 0 that
(4.3) 0ii c
and in turn, by setting 1 1 , that
(4.4) 1i ii c x
Hence, in a manner identical with (3.15), these two conditions are seen to imply that
(4.5) 1 1 1i i i ii i
c Y c
and thus that the variance of 1 is given by
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
(4.6) 2 2 2
1var( ) var( ) ii ii i
c c
To compare this with var( 1
ˆ) , observe first that if the differences between the coefficients of 1
and 1
ˆ in (4.1) and(3.6) are denoted by , 1, .., ,i i id c w i n then (4.6) can be rewritten as
(4.7) 22 2 2 2
1var( ) ( ) ( 2 )ii i i i ii i i i
w d w d w d
But by (4.3) and (4.4) together with (3.8) and (3.10) we must have
(4.8) 0 (0) 0i i i i ii i i i i
c w d d d
(4.9) 1 1 0i i i i i i i i i ii i i i i
c x w x d x d x d x
which together imply that
(4.10) 2 2
( ) 0
( ) ( )
i i ii i i
i i ii i
j jj j
d x x dx x
d w d x x x x
Hence, recalling (3.7) , we see that (4.7) reduces to
(4.11) 2 2 2 2 2 2
1 1
ˆvar( ) var( )i i ii i i
w d d
and may conclude from the nonnegativity of 2 2
ii d that
(4.12) 1 1
ˆvar( ) var( )
Thus 1
ˆ has minimum variance among all linear unbiased estimators, and the result is
established.
5. REFERENCES
Devore, J.L., (2008) Probability and Statistics for Engineering and the Sciences,
Seventh Edition, Duxbury Press, Belmont, California.
Larsen, R.J. and M.L. Marx, (1986) An Introduction to Mathematical Statistics and its
Applications, Second Edition, Prentice-Hall, Englewood Cliffs, N.J.
Johnston, J., (1963) Econometric Methods, McGraw-Hill, N.Y.
1var( ) var( ) ii ii i
c c
To compare this with var( 1
ˆ) , observe first that if the differences between the coefficients of 1
and 1
ˆ in (4.1) and(3.6) are denoted by , 1, .., ,i i id c w i n then (4.6) can be rewritten as
(4.7) 22 2 2 2
1var( ) ( ) ( 2 )ii i i i ii i i i
w d w d w d
But by (4.3) and (4.4) together with (3.8) and (3.10) we must have
(4.8) 0 (0) 0i i i i ii i i i i
c w d d d
(4.9) 1 1 0i i i i i i i i i ii i i i i
c x w x d x d x d x
which together imply that
(4.10) 2 2
( ) 0
( ) ( )
i i ii i i
i i ii i
j jj j
d x x dx x
d w d x x x x
Hence, recalling (3.7) , we see that (4.7) reduces to
(4.11) 2 2 2 2 2 2
1 1
ˆvar( ) var( )i i ii i i
w d d
and may conclude from the nonnegativity of 2 2
ii d that
(4.12) 1 1
ˆvar( ) var( )
Thus 1
ˆ has minimum variance among all linear unbiased estimators, and the result is
established.
5. REFERENCES
Devore, J.L., (2008) Probability and Statistics for Engineering and the Sciences,
Seventh Edition, Duxbury Press, Belmont, California.
Larsen, R.J. and M.L. Marx, (1986) An Introduction to Mathematical Statistics and its
Applications, Second Edition, Prentice-Hall, Englewood Cliffs, N.J.
Johnston, J., (1963) Econometric Methods, McGraw-Hill, N.Y.
1 out of 8
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.