Softmax與SVM都是用來對數據進行分類的。Softmax常用於神經網絡的輸出層,SVM常常直接與SGD配合實現物體分類。無論是Softmax還是SVM在工作時都需要計算出loss和gradient,學習使用中發現兩者有很多相似之處,特拿來對比學習。
公式
圖解
python代碼實現
"""
Structured softmax and SVM loss function.
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
def softmax_loss_vectorized(W, X, y):
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
score = X.dot(W)
shift_score = score - np.max(score, axis=1, keepdims=True) # 對數據做了一個平移
shift_score_exp = np.exp(shift_score)
shift_score_exp_sum = np.sum(shift_score_exp, axis=1, keepdims=True)
score_norm = shift_score_exp / shift_score_exp_sum
loss = np.sum(-np.log(score_norm[range(score_norm.shape[0]), y])) / num_train
# dW
d_score = score_norm
d_score[range(d_score.shape[0]), y] -= 1
dW = X.T.dot(score_norm) / num_train
return loss, dW
def svm_loss_vectorized(W, X, y):
delta = 1
num_training = X.shape[0]
scores = X.dot(W)
scores_gt_cls = scores[range(num_training), y][..., np.newaxis]
scores_dis = scores - scores_gt_cls + delta
scores_dis[range(num_training), y] -= delta
scores_norm = np.maximum(0, scores_dis)
loss = np.sum(scores_norm) / num_training
d_scores = scores_norm
d_scores[d_scores > 0] = 1 # 出現錯誤得分的地方統統設爲1
row_sum=np.sum(d_scores, axis=1)
d_scores[range(num_training), y] -= row_sum
dW = X.T.dot(d_scores)/num_training
return loss, dW