How to Overcome Procrastination and Boost Productivity

Verified

Added on 2023/04/26

AI Summary

Derivation of gradient of negative log-likelihood in logistic regression. Calculation of gradient using sum of (1-yi) and yi with sigmoid. Calculation of W for logistic regression. Probability of positive outcome of patient's visit and coefficient vector. Model assessment scheme and physician feedback label. Time complexity of update rule and learning rate bounds.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Table of Contents
1.1 Batch Gradient Descent........................................................................................................3
a) logistic Regression of W setting........................................................................................3
1.2 Stochastic Gradient Descent................................................................................................4

1.1 Batch Gradient Descent
Considering a binary classification problem with data NLL (D, w) = − PN i=1 (1 − yi) log(1
− σ(wT xi)) + yi log σ(wT xi) Given the following definitions:
f(x)= wT σ
L(W)= (D, w) = − PN i=1 (1 − yi) log(1 − σ(wT xi)) + yi log σ(wT xi)
L(W)=∑
i=1
PN
(1− yi)log ¿ ¿
Where wT xi is a vector. p(x) p(y) is a short-hand for p(y=1 | x)p(y=1 | x).
L(w)=∑
i=1
PN
(1− yi)log ¿ ¿
∂
∂ w L(w)=∑
i=1
PN
¿¿
∂
∂ w yilog σ (wT xi)=¿yi).log σ (wT xi)+yi.¿yi))
=0.log σ (wT xi)+yi. ¿yi))
=yi. (p(yi) (⋅ 1−p(wT xi))
∂
∂ w ¿yi)log σ (1− p ( wT xi ) )=(1-yi). ∂
∂ w ¿)
=(1-yi). 1
1− p ( wT xi ) . p ( wT xi ).1− p ( wT xi )
=(1-yi). p ( wT xi )
∂
∂ w L(w)=∑
i=1
PN
(1− yi ) . p ( wT xi ) + yi . p ( wT xi )
The derivation of gradient of the negative log-likelihood in terms of
W= ∑
i=1
PN
(1− yi+ yi). p ( wT xi )

a) logistic Regression of W setting
1.2 Stochastic Gradient Descent
a) The every patient confirmation is likewise connected with a parallel name y 2 {+1, 1}.
Every day of a visit in which the patient in the long run tests positive is named +1 and 1
generally. In this manner every patient confirmation p(i) comprises of mi (include vector,
mark) sets:
p(i) = {(x(i) t , y(i) t )}mi t=1
Xt=∑
i=1
PN
p ( wT xi )+∈
P(yt/xt,w)= 1
√ 2 πσ2 e
−1
1− p ( wT xi )
wML = arg max w {p(y|X, w)}
b)
 These outcomes in different expectations for every patient confirmation of the
coefficient vector Xt , one comparing to every day of the affirmation.
 We consider a model assessment conspire that takes these expectations into thought,
yet still yields a solitary proportion of execution can be consider as physician
feedback label Y t .
 One could envision an approval plot in which the execution of a classifier is assessed
for every day autonomously.
 While complete, this assessment still needs importance from a clinical point of view
since it isn't clear how to decipher the utility of a classifier that accurately
characterizes a patient m days out of a sum of n days of the time period can be
calculate asW t −1.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Derivation
=||Xt- wt+1||<||yt- wt−1|| ¿∨Xt −wt +1∨¿2<¿ yt−wt−1∨¿2
= ¿∨Xt ∨¿2-2xtT ytT +¿∨wt +1∨¿2-¿∨ y t∨¿2-2xtT ytT +¿∨wt −1∨¿2
=wT +1(xt)- = ¿∨Xt +1∨¿2/2>wT−1(yt)-¿∨ y t−1∨¿2/2
=((Xt +1)-¿))T xt+ 1
2|| ¿∨Xt +1∨¿2−¿∨ y t−1∨¿2>0
c) The time complexity of the update rule from b if Xt is very sparse.
The time complexity of the Xt need to compute the equation is,
Xt =∑
i=1
PN
xi ( Xt
T 1− yi )+λ X i
Xt =∑
i=1
PN
xi ¿ ¿
Sparse matrix of the each column of X has non zero entries (P Xt =(X¿) is the very sparse.
d).Large η can be denoted as oscillations, instability.
Small η can be denoted as slow convergence.
Bounds over the learning rate, η: 0 < η< 2/( η max )
There are clear tradeoffs between utilization of a vast versus small η. large η accelerate
adapting yet can be precarious. Small η is steady however results in slower learning. Along
these lines, it is alluring in the first place a Large η and decrease it with time. Averaging of
past sources of info prompts stable weight elements, which requires little η • Fast adjustment
requires substantial η .Learning guideline can likewise be gotten from a blunder work is
derivation is,
Xt =∑
i=1
PN
xi ( Xt
T 1− yi )+ λ X i +η: 0 < η< 2/( η max )

E denotes the squared error over all patterns.
e). A relapse demonstrate that utilizes L1 regularization system is called Lasso Regression
and model which utilizes L2 is called Ridge Regression. The key distinction between these
two is the punishment term. Edge relapse includes "squared greatness" of coefficient as
punishment term to the misfortune work. The Time complexity is,
l- μ∨¿W ¿∨¿2
2 ¿, μ isthe constant values
The W t can be calculate the values of L1 regularization value is zero and l-μ∨¿W ¿∨¿2
2 ¿,
is denoted as time complexity of L1 (W t ) derivation is.,
μ∨¿W ¿∨¿2
2 ¿=∑
i=1
PN
λ∨wt|=∑
i=1
PN
μ∨wt|
μ∨¿W ¿∨¿2
2 ¿,=argminw∑
i=1
Pn
¿¿
W t is the values of time complexity.

1 out of 6

How to Overcome Procrastination and Boost Productivity

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Related Documents

Statistics. Table of Contents. Problem 2...............

Exploring Nonlinear Approximation Techniques and Their Applications

Statistical Computing and Bayesian Stats Assignment

+13062052269

info@desklib.com