損失函數加入L1正則後,目標函數變爲J(θ)=L(θ)+c∥θ∥1。When θ>0, the gradient of c∥θ∥1 equals c; when θ<0, the gradient of c∥θ∥1 equals −c. Therefore, if the gradient of L(θ) lies within (−c,c), the gradient of J(θ) is always negative for θ<0, indicating that J(θ) is monotonically decreasing on the left of the origin; its gradient is always positive for θ>0, indicating monotonic increase on the right of the origin. Therefore, the minimum takes place at θ=0.
反觀L2正則,原點處導數爲零。The gradient of J(θ) at the origin equals zero iff the gradient of L(θ) at the origin equals zero. Therefore, the possibility of having sparse solutions with theL2-norm regularisation is much less likely than with the L1-norm regularisation.
複習soft-thresholding和simplifed LASSO problem [2] βmin21∥y−β∥22+λ∥β∥1
Let v∈∂(β). By subgradient optimality condition, {(yi−βi)=λsign(βi)∣yi−βi∣≤λif βi=0if βi=0.
When βi>0, yi−βi=λ, requiring that yi−λ>0; when βi<0, yi−βi=−λ, requiring that yi+λ>0; when βi=0, ∣yi∣≤λ. Combining the three cases leads to the soft-thresholding operator: Sλ(y)i=⎩⎪⎨⎪⎧yi−λ0yi+λif yi>λif −λ≤yi≤λif yi≤−λ.