attention is all you need實現（詳細註釋）（二）模型

原創

酸柠浮冷萃

2020-06-23 01:11

對源代碼中TF2已經移除的方法作了替換

1、embedding 函數

word embedding：從數據中自動學習到輸入空間到Distributed representation空間的映射，降低訓練所需要的數據量
tf.compat.v1.variable_scope：A context manager for defining ops that creates variables (layers)
look_table：查找表，相當於字典的作用
tf.compat.v1.get_variable：Gets an existing variable with these parameters or create a new one.
initializer=tf.initializers.GlorotUniform：“Xavier”初始化方法是一種很有效的神經網絡初始化方法，方法來源於2010年的一篇論文《Understanding the difficulty of training deep feedforward neural networks》
tf.nn.embedding_lookup(lookup_table, inputs)：在lookup_table中查找下標爲inputs的元素
tf.compat.v1.disable_v2_behavior()：這個函數可以在程序開始時調用(在創建張量、圖形或其他結構之前，以及在初始化設備之前)。它將所有在tensorflow1和2之間不同的全局行爲切換爲預期爲1的行爲。

def embedding(inputs,
              vocab_size,
              num_units,
              zero_pad=True,
              scale=True,
              scope="embedding",
              reuse=None):
    with tf.compat.v1.variable_scope(scope, reuse=reuse):
        # 構件查找表
        lookup_table = tf.compat.v1.get_variable(name='lookup_table',
                                       dtype=tf.float32,
                                       shape=[vocab_size, num_units],
                                       initializer=tf.initializers.GlorotUniform())

        if zero_pad:
            # 將lootup_table中第一個張量替換爲全0張量
            # 張量拼接
            lookup_table = tf.concat((tf.zeros(shape=[1, num_units]),
                                      lookup_table[1:, :]), 0) #(拼接對象，維度)

        outputs = tf.nn.embedding_lookup(lookup_table, inputs)
        # 選取一個張量（lookup_table）裏面索引對應的元素（inputs）

        if scale:
            outputs = outputs * (num_units ** 0.5)

    return outputs


def main():
    inputs = tf.dtypes.cast(tf.reshape(tf.range(2 * 3), (2, 3)), tf.int32)
    # 數據類型轉換
    outputs = embedding(inputs, 6, 2, zero_pad=True)
    with tf.compat.v1.Session() as sess:
        sess.run(tf.compat.v1.global_variables_initializer())

if __name__ == '__main__':
    main()

輸出示例：

inputs:
 [[0 1 2]
 [3 4 5]]

outputs:
 [[[ 0.          0.        ]
  [-0.6018285   0.36682096]
  [-1.1781635  -0.9732541 ]]

 [[-1.0972805   0.67716676]
  [-0.09731749 -0.4502349 ]
  [-0.88273793 -0.16005561]]]

2、position encoding

由於我們的模型不包含遞歸和卷積，爲了使模型能夠利用序列的順序，我們必須注入一些關於序列中標記的相對或絕對位置的信息。爲此，我們將“位置編碼”添加到編碼器和解碼器堆棧底部的輸入嵌入中。位置編碼具有與嵌入相同的維度dmodel，因此這兩個維度可以相加。--《attention is all you need》
tf.tile(input, multiples, name=None)：通過複製擴展張量，multiple表示對應維度複製的倍數。詳解：https://blog.csdn.net/tsyccnh/article/details/82459859

tf.expand_dims(input, dim, name=None)：在dim的位置增加一維

def positional_encoding(inputs,
                        num_units,
                        zero_pad = True,
                        scale = True,
                        scope = "positional_encoding",
                        reuse=None):
    N,T = inputs.get_shape().as_list()
    with tf.compat.v1.variable_scope(scope, reuse=reuse):
        # scope = "positional_encoding"
        position_ind = tf.tile(tf.expand_dims(tf.range(T), 0), [N, 1])

        position_enc = np.array([
            [pos / np.power(10000, 2.*i / num_units) for i in range(num_units)]
            # pos代表的是第幾個詞，i代表embedding中的第幾維
            for pos in range(T)])

        position_enc[:, 0::2] = np.sin(position_enc[:, 0::2]) # dim 2i 遞增
        position_enc[:, 1::2] = np.cos(position_enc[:, 1::2]) # dim 2i+1 遞減
        print('position_enc:\n', position_enc)

        lookup_table = tf.convert_to_tensor(position_enc)
        if zero_pad:
            t1 = tf.cast(tf.zeros(shape=[1, num_units]),dtype=tf.float64)
            # lookup_table數據類型爲float64
            lookup_table = tf.concat((t1, lookup_table[1:, :]), 0)

        outputs = tf.nn.embedding_lookup(lookup_table, position_ind)

        if scale:
            outputs = outputs * num_units ** 0.5

    return outputs

輸出示例：

未經過zero_padding和scale的position_encoding

inputs:
 [[0 1 2]
 [3 4 5]]
outputs:
 [[[0.         0.        ]
  [1.19001968 1.41421356]
  [1.28594075 1.41421353]]

 [[0.         0.        ]
  [1.19001968 1.41421356]
  [1.28594075 1.41421353]]]

3、multihead_attention

Q、K、V含義：Q表示當前要翻譯的單詞（向量），K表示句子中的所有單詞，通過一系列計算得到每個K對應的V表示K與V的相關程度。關於attention的原理這篇文章解釋的很好：https://baijiahao.baidu.com/s?id=1622064575970777188&wfr=spider&for=pc；Q、K的輸入爲embedding後的句子，維度爲三
tf.transpose(a, perm)：對張量按照perm的順序重排
tf.sign(x)：返回一個數字符號的元素指示。如果x < 0,則有 y = sign(x) = -1；如果x == 0，則有 0 或者tf.is_nan(x)；如果x > 0,則有1。對於NaN輸入返回零。對於複雜的數字,如果x != 0,則有y = sign(x) = x / |x|,否則y = 0。
對Padding部分進行掩碼：

# 這裏是對填充的部分進行一個mask，這些位置的attention score變爲極小，我們的embedding操作中是有一個padding操作的，
        # 填充的部分其embedding都是0，加起來也是0，我們就會填充一個很小的數。
        key_masks = tf.sign(tf.abs(tf.reduce_sum(keys,axis=-1))) #全爲0的行被標記，二維
        key_masks = tf.tile(key_masks,[num_heads,1]) #擴大回512
        key_masks = tf.tile(tf.expand_dims(key_masks,1),[1,tf.shape(queries)[1],1])

        paddings = tf.ones_like(outputs) * (-2 ** 32 + 1)
        outputs = tf.where(tf.equal(key_masks,0),paddings,outputs) # 8*10*10

Mask掩碼：在decoder的輸入部分（Q=K=dec）使用

#tril = tf.contrib.linalg.LinearOperatorTriL(diag_vals).to_dense()
tril = tf.linalg.LinearOperatorLowerTriangular(diag_vals).to_dense() #10*10下三角爲1，上三角爲0

[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]

最後呢，有一個之前不太清楚的問題：attention學習的參數到底有哪些？除了其他網絡層，計算K、Q、V其實分別是queries，key，key，與係數矩陣相乘得到的，也屬於訓練參數。

4、Normalize

tf.nn.moments(x, axes, name=None, keep_dims=False)：求x在指定維度（axes）上的均值和方差

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

attention is all you need實現（詳細註釋）（二）模型

1、embedding 函數

2、position encoding

3、multihead_attention

4、Normalize

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

Shell/Python中的用戶名獲取

【機器學習】李宏毅HW1：預測PM2.5

【深度學習】 HW3：Image Sentiment Classification

李宏毅HW1：矩陣運算、圖像處理

C++學習從反序數到迴文數

C++學習無符號整型變量與與運算

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結