Logistic Regression

  • why named logistic regression? (cause losigtic function)
  • what is the model?
  • how to solve the minimal/maximal problem?

2-class problem

P(y=1|x,θ)=f(x)=11+eθTx
this logistic (sigmoid) makes probability lies in 0~1
P(y=0|x,θ)=1f(x)

// also lies in 0~1

in all,

P(y|x,θ)=f(x)y(1f(x))1y

假定我們已經學到了最優的θ ,那麼分類的實現是計算P(1|x,θ), if >0.5, then p1p>1 ; if <0.5, then p1p<1

學習的目標是最大化整個樣本集合成立的概率:

L(θ)=i=1nP(yi|xi,θ)

(θ)=logL(θ)

then gradient descent could be applied to solve the MLE problem.

Multi-class problem

ofvitalimportance
0
give the constrain the probability of different class:

i=1mP(y(i)=1|x,w)=1
m equals to the class number and wRd×m , where d is the dimension of feature vector x .
P(y(i)=1|x,w)=exp(w(i)Tx)mj=1exp(w(j)Tx)for i1,...,m

The cost function is

SMLR - Sparse multinomial Logistic Regression

In total m classes, input vector/feature is d -dimensional,
the weight vector for one of the classes need not be estimated. Without loss of generality, we thus set w(m)=0 and the only parameters to be learned are the weight vectors w(i) for i1,,m1 . For the remainder of the paper, we use w to denote the (d(m-1))-dimensional vector of parameters to be learned.

for ordinary softmax regression (also named as multinomial logistic regression-MLR), the probability that x belongs to class i is written as:

P(y(i)=1|x,w)=exp(w(i)Tx)mj=1exp(w(j)Tx)for i1,...,m

(w)=logj=1nP(yj|xj,w)
, where n is the total number of samples.
(w)=j=1nlogP(yj|xj,w)

(w)=j=1ni=1m1{y(j)=i}logexp(w(i)Txj)mj=1exp(w(j)Txj),
where n is the number of samples, m is the number of classes.
(w)=j=1n{i=1my(i)jw(i)Txjlogi=1mexp(w(i)Txj)}

Besides on, add sparsity constraints to the cost function,

w^MAP=argmaxw{(w)+logp(w)}

In SMLR, p(w)exp(λ||w||1)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章