Derivation of Linear Regression Coefficients and Ridge Regression vs LASSO

Verified

Added on 2023/06/10

AI Summary

This article explains the derivation of linear regression coefficients and how they are obtained through minimizing the sum of squared errors. It also discusses the differences between Ridge Regression and LASSO, both of which are shrinkage methods used to control the size of coefficients and amount of regularization in the model. The article also covers the effects of the shrinkage parameter on the coefficients. Additionally, it includes a practical example of linear regression analysis with a confusion matrix.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Homework2
1(a)
  

n
i
n
i
ii xyS 1 1
2
1010 )(),(





∂f (x , y )
∂ x =0
∂f ( x , y )
∂ y =0
∂S ( β0 , β1)
∂ β0
=0
∂S ( β0 , β1)
∂ β1
=0
Now let β0
¿ =β0+ β1 x where x=
∑
i=1
n
xi
n
S ( β0 , β1)=∑
i=1
n
( yi−β0− β1 xi )2=∑
i=1
n
( yi−( β0+ β1 x )− β1 ( xi−x ))2
=∑
i=1
n
( yi−β0
¿ −β1( xi −x ))2=S ( β0
¿ , β1).
We obtain 
0
 and 1
 through minimizing S ( β0
¿ , β1) moreover b0 is obtained
through b0
¿ =b0+ b1 x . Therefore:
∂S ( β0
¿ , β1)
∂ β0
¿ =
∂(∑
i=1
n
( yi−β0
¿ −β1 ( xi −x ))2)
∂ β0
¿ =−∑
i=1
n
2( yi−β0
¿ −β1( xi−x ))=0
Where:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

∑
i=1
n
β1( xi−x )=β1 ∑
i=1
n
(xi−x )=β1( ∑
i=1
n
xi−n x )=β1 (∑
i=1
n
xi−∑
i=1
n
xi )=0 ,
And
−∑
i =1
n
2( yi−β0
¿ −β1 (xi−x ))=−2∑
i=1
n
( yi−β0
¿ )=−2(∑
i=1
n
yi−nβ0
¿ )=0 .
Therefore:
b0
¿ =
∑
i=1
n
yi
n =¯y
But 
n
i
iy
1 was originally 0, clearly
b0
¿ = 0
n =0
1(b)
∂S (b0
¿ , β1 )
∂ β1
=
∂(∑
i=1
n
( yi− y−β1( xi−x ))2 )
∂ β1
=−2∑
i=1
n
( yi− y−β1( xi−x ))( xi−x )=0
∑
i =1
n
( yi− y )( xi−x )−β1 ∑
i=1
n
( xi−x )2=0
Given that:
s XX=∑
i =1
n
( xi −¯x )
2=∑
i=1
n
xi
2−n ¯x2
and
s XY =∑
i =1
n
( xi −¯x )( yi− y )=∑
i=1
n
xi yi −n x y
Therefore

b1=
∑
i=1
n
( yi − y )( xi−x )
∑
i=1
n
( xi−x )2
= sXY
sXX
But
therefore

 1
1
'
1
brb
Implying that the estimated β is close to the true β
1(c)(1)
In Ridge regression and LASSO, λ is considered a shrinkage parameter. It therefore:
i. controls the size of the coefficients
ii. controls amount of regularization of the model
iii. Moreover as λ tends to 0 we obtain the least squares solutions
iv. Also as λ tends to infinity βrigde=0
Therefore both are shrinkage methods
1(c)(2)
In Ridge regression the assumption is often similar to that of least squares regression
except that normality is not assumed same as in LASSO regression. However, ridge
regression shrinks the coefficient values but not exactly to zero unlike in LASSO
where the coefficients are shrunk to zero in order to aid in feature selection.
2(a)