正確理解TensorFlow中的logits

【問題】I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits. What is it? In a lot of methods in the API docs it is written like
我正想通過tensorflow API文檔在這裏。在tensorflow文檔中,他們使用了一個叫做關鍵字logits。它是什麼?API文檔中的很多方法都是這樣寫的

 

tf.nn.softmax(logits, name=None)

If what is written is those logits are only Tensors, why keeping a different name like logits?

Another thing is that there are two methods I could not differentiate. They were
如果寫的是logits只有這些Tensors,爲什麼要保留一個不同的名字logits?

另一件事是有兩種方法我不能區分。他們是

 

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

What are the differences between them? The docs are not clear to me. I know what tf.nn.softmaxdoes. But not the other. An example will be really helpful.
他們之間有什麼不同?文檔對我不明確。我知道是什麼tf.nn.softmax。但不是其他。一個例子會非常有用。
Short version:

Suppose you have two tensors, where y_hat contains computed scores for each class (for example, from y = W*x +b) and y_true contains one-hot encoded true labels.
假設您有兩個張量,其中y_hat包含每個類的計算得分(例如,從y = W * x + b),並y_true包含一個熱點編碼的真實標籤。

 

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

If you interpret the scores in y_hat as unnormalized log probabilities, then they are logits.

Additionally, the total cross-entropy loss computed in this manner:
如果您將分數解釋爲y_hat非標準化的日誌概率,那麼它們就是logits。

另外,以這種方式計算的總交叉熵損失:

 

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

本質上等價於用函數計算的總交叉熵損失softmax_cross_entropy_with_logits():
is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits():

 

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

Long version:

In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b. To serve as an example, below I've created a y_hat as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.
在神經網絡的輸出層中,您可能會計算一個數組,其中包含每個訓練實例的類分數,例如來自計算y_hat = W*x + b。作爲一個例子,下面我創建了y_hat一個2×3數組,其中行對應於訓練實例,列對應於類。所以這裏有2個訓練實例和3個類別。

 

import tensorflow as tf
import numpy as np

sess = tf.Session()

# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

Note that the values are not normalized (i.e. the rows don't add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.
請注意,這些值沒有標準化(即每一行的和不等於1)。爲了對它們進行歸一化,我們可以應用softmax函數,它將輸入解釋爲非歸一化對數概率(又名logits)並輸出歸一化的線性概率。

 

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

It's important to fully understand what the softmax output is saying. Below I've shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.
充分理解softmax輸出的含義非常重要。下面我列出了一張更清楚地表示上面輸出的表格。可以看出,例如,訓練實例1爲“2類”的概率爲0.619。每個訓練實例的類概率被歸一化,所以每行的總和爲1.0。

 

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3)
                    ,--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1".

Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_true array, where again the rows are training instances and columns are classes. Below I've created an example y_true one-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3".
所以現在我們有每個訓練實例的類概率,我們可以在每個行的argmax()中生成最終的分類。從上面,我們可以生成訓練實例1屬於“2類”,訓練實例2屬於“1類”。

這些分類是否正確?我們需要根據訓練集中的真實標籤進行測量。您將需要一個熱點編碼y_true數組,其中行又是訓練實例,列是類。下面我創建了一個示例y_trueone-hot數組,其中訓練實例1的真實標籤爲“Class 2”,訓練實例2的真實標籤爲“Class 3”。

 

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

Is the probability distribution in y_hat_softmax close to the probability distribution in y_true? We can use cross-entropy loss to measure the error.
概率分佈是否y_hat_softmax接近概率分佈y_true?我們可以使用交叉熵損失來衡量錯誤。

 

 

We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmax showed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3".
我們可以逐行計算交叉熵損失並查看結果。下面我們可以看到,訓練實例1損失了0.479,而訓練實例2損失了1.200。這個結果是有道理的,因爲在我們上面的例子中y_hat_softmax,訓練實例1的最高概率是“類2”,它與訓練實例1匹配y_true; 然而,訓練實例2的預測顯示“1類”的最高概率,其與真實類“3類”不匹配。

 

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

What we really want is the total loss over all the training instances. So we can compute:
我們真正想要的是所有培訓實例的全部損失。所以我們可以計算:

 

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

Using softmax_cross_entropy_with_logits()

We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits() function, as shown below.
使用softmax_cross_entropy_with_logits()

我們可以用tf.nn.softmax_cross_entropy_with_logits()函數來計算總的交叉熵損失,如下所示。

 

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

Note that total_loss_1 and total_loss_2 produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits().
請注意,total_loss_1並total_loss_2產生基本相同的結果,在最後一位數字中有一些小的差異。但是,你可以使用第二種方法:它只需要少一行代碼,並累積更少的數字錯誤,因爲softmax是在你內部完成的softmax_cross_entropy_with_logits()。

form Stack Overflow[https://stackoverflow.com/questions/34240703/what-is-logits-softmax-and-softmax-cross-entropy-with-logits?noredirect=1&lq=1]



 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章