Learning Tensorflow(5)---LSTM

LSTM本質是RNN，最大的區別在於在RNN基礎結構上加入了一條cell state的信息傳送帶，用於記憶信息，使能處理長距離的上下文依賴。

LSTM網絡結構

細胞狀態

LSTM的核心是細胞狀態，也就是下圖中頂部的水平線，其作用可以理解爲整個模型中的記憶空間，隨着時間的變化而變換，傳送帶本身無法控制哪些信息是否被記憶，其控制作用的是下方的門結構，包括忘記門，輸入門，候選門，輸出門。

忘記門：
忘記門控制着該忘記哪些信息，通過傳統sigmoid激活函數來實現。

其中：上一層輸出信息，爲當前信息，兩者進行線性組合後，利用sigmoid激活函數得到一個0~1的輸出，當函數值接近0時，表示記憶體丟失的信息越多。

輸入門和候選門：
輸入門用於確定什麼信息將會被存儲在細胞狀態中，候選門用於計算當前的輸入和過去的記憶所具有信息的綜合。

包含兩個部分，sigmoid層稱爲輸入門，決定更新哪個值。接着，tanh層創建一個候選值向量，該向量將會被添加到細胞狀態中。

接下來更新細胞狀態，通過以上的兩步操作，忘記了決定忘記的舊的信息，添加了決定記起的新的信息。

輸出門：
輸出門用於決定輸出什麼樣的信息。

首先使用sigmoid層決定細胞狀態的哪一部分需要輸出，然後將細胞狀態通過tanh層，最後將兩者相乘作爲輸出。

Tensorflow中構建LSTM

lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0,
               state_is_tuple=True, activation=None, reuse=None, name=None)

例如在mnist實驗中，輸入數據的維度爲28 * 28，那麼將樣本的每一行當成一個輸入，通過28個時間步驟展開LSTM，在每一個LSTM單元，輸入一行維度爲28的向量。

對於每一個LSTM單元，參數num_units表示每一個單元輸出爲128 * 1的向量。
如下圖所示，對於每一個輸入28維的向量，LSTM單元會將他映射到128維，在下一個LSTM單元時，LSTM會接收上一個128維的輸出，和新的28維的輸入，處理之後再映射成一個新的128維的向量輸出，就這麼一直處理下去，直到網絡中最後一個LSTM單元，輸出一個128維的向量。
lstm輸入數據的格式爲【batch_size , n_steps, n_inputs】

def inference(input_tensor):
    with tf.variable_scope('lstm1'):

        lstm1_weights_out = tf.get_variable("weight_out", [n_hidden_units,n_classes],initializer = tf.random_normal_initializer())
        lstm1_biases_out = tf.get_variable("bias_out", [n_classes,],initializer = tf.constant_initializer(0.1))
        
        
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
        
        stack = tf.contrib.rnn.MultiRNNCell([lstm_cell] * 1,
                                        state_is_tuple=True)
        _init_state = stack.zero_state(batch_size, dtype=tf.float32)

        outputs,states = tf.nn.dynamic_rnn(stack, input_tensor, initial_state=_init_state, time_major=False)

        outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
        results = tf.matmul(outputs[-1], lstm1_weights_out) + lstm1_biases_out
       
    return results

outputs,states = tf.nn.dynamic_rnn(stack, x_in, initial_state=_init_state, time_major=False)

得出的output爲【1 * 28 * 128】的矩陣，在使用交叉熵作爲損失函數時，只需要將最後一維（最後一個step）1 * 128的向量作爲fc的輸入。
在使用ctc作用損失函數時，每一個step的輸出均參與計算。

MultiRNNCell 多層堆疊CNN
很多時候，單層RNN的能力有限，我們需要多層的RNN。將x輸入第一層RNN的後得到隱層狀態h，這個隱層狀態就相當於第二層RNN的輸入，第二層RNN的隱層狀態又相當於第三層RNN的輸入，以此類推。

Tensorflow中的實現就是使用tf.nn.rnn_cell.MultiRNNCell
聲明一個cell
MultiRNNCell中傳入[cell]*num_layers就可以了
注意如果是LSTM，定義參數state_is_tuple=True

    layers = [tf.nn.rnn_cell.GRUCell(num_hidden) for _ in range(num_layers)]
    # Stacking rnn cells
    stack = tf.contrib.rnn.MultiRNNCell(layers,
                                        state_is_tuple=True)
    init_state = stack.zero_state(batch_size, dtype=tf.float32)
    # The second output is the last state and we will no use that
    outputs,states = tf.nn.dynamic_rnn(stack, x_in, initial_state=_init_state, time_major=False)

這裏的layers不能直接使用
cell = tf.nn.rnn_cell.GRUCell(num_hidden)
layers = cell * num_layers

最好是將cell的形成過程使用函數封裝

def lstm_cell(is_trainning):
    cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden)
        
    if is_trainning:
        cell = tf.nn.rnn_cell.DropoutWrapper(cell, 0.5)
    return cell

然後調用該函數進行多層layers的構造

layers = [lstm_cell(is_trainning) for _ in range(num_layers)]
        # Stacking rnn cells
        stack = tf.nn.rnn_cell.MultiRNNCell(layers,
                                            state_is_tuple=True)
    #    init_state = stack.zero_state(batch_size, dtype=tf.float32)
    
        # The second output is the last state and we will no use that
        outputs, _ = tf.nn.dynamic_rnn(stack, lstm_input, seq_len, dtype=tf.float32)

tf.nn.dynamic_rnn 一次執行多步
對於單個的RNNCell，我們使用它的call函數進行運算時，只是在序列時間上前進了一步。比如使用x1、h0得到h1，通過x2、h1得到h2等。這樣的h話，如果我們的序列長度爲10，就要調用10次call函數，比較麻煩。對此，TensorFlow提供了一個tf.nn.dynamic_rnn函數，使用該函數就相當於調用了n次call函數。即通過{h0,x1, x2, …., xn}直接得{h1,h2…,hn}。

# inputs: shape = (batch_size, time_steps, input_size)
# cell: RNNCell
# initial_state: shape = (batch_size, cell.state_size)。初始狀態。一般可以取零矩陣
outputs, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)

RNN在處理序列問題的時候，由於每個batch的輸入數據一般被抽象成(batchsize，timestep，input dim)的3d張量，所以在batch裏的每個sample，都是一個(timestep，input dim)的矩陣，所以每個樣本的timestep要保持一致, 當輸入的序列爲不定長序列是，需要padding成統一長度的序列。

變長序列的處理方法

def pad_sequences(sequences, maxlen=None, dtype=np.float32,
padding=‘post’, truncating=‘post’, value=0.):
lengths = np.asarray([len(s) for s in sequences], dtype=np.int64)

nb_samples = len(sequences)
if maxlen is None:
    maxlen = np.max(lengths)

# take the sample shape from the first non empty sequence
# checking for consistency in the main loop below.
sample_shape = tuple()
for s in sequences:
    if len(s) > 0:
        sample_shape = np.asarray(s).shape[1:]
        break

x = (np.ones((nb_samples, maxlen) + sample_shape) * value).astype(dtype)
for idx, s in enumerate(sequences):
    if len(s) == 0:
        continue  # empty list was found
    if truncating == 'pre':
        trunc = s[-maxlen:]
    elif truncating == 'post':
        trunc = s[:maxlen]
    else:
        raise ValueError('Truncating type "%s" not understood' % truncating)

    # check `trunc` has expected shape
    trunc = np.asarray(trunc, dtype=dtype)
    if trunc.shape[1:] != sample_shape:
        raise ValueError('Shape of sample %s of sequence at position %s is different from expected shape %s' %
                         (trunc.shape[1:], idx, sample_shape))

    if padding == 'post':
        x[idx, :len(trunc)] = trunc
    elif padding == 'pre':
        x[idx, -len(trunc):] = trunc
    else:
        raise ValueError('Padding type "%s" not understood' % padding)
return x, lengths



dynamic有個參數：sequence_length，這個參數用來指定每個example的長度，比如上面的例子中，我們令 sequence_length爲[20,13]，表示第一個example有效長度爲20，第二個example有效長度爲13，當我們傳入這個參數的時候，對於第二個example，TensorFlow對於13以後的padding就不計算了，其last_states將重複第13步的last_states直至第20步，而outputs中超過13步的結果將會被置零。


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# parameters init
l_r = 0.001
training_iters = 100000
batch_size = 128

n_inputs = 28 #單位時間的特徵向量高度
n_steps = 28 #時間序列的長度
n_hidden_units = 128 #輸出
n_classes = 10
tf.reset_default_graph() 

def inference(input_tensor):
    with tf.variable_scope('lstm1'):
       #lstm層之前加一個線性層，是爲了將輸入數據映射到與hidden_units相同的維度上
        lstm1_weights_in = tf.get_variable("weight_in", [n_inputs,n_hidden_units],initializer = tf.random_normal_initializer())
        lstm1_biases_in = tf.get_variable("bias_in", [n_hidden_units,],initializer = tf.constant_initializer(0.1))
        
        lstm1_weights_out = tf.get_variable("weight_out", [n_hidden_units,n_classes],initializer = tf.random_normal_initializer())
        lstm1_biases_out = tf.get_variable("bias_out", [n_classes,],initializer = tf.constant_initializer(0.1))
        
        input_tensor = tf.reshape(input_tensor, [-1, n_inputs])
        x_in = tf.matmul(input_tensor, lstm1_weights_in) + lstm1_biases_in
        x_in = tf.reshape(x_in, [-1, n_steps, n_hidden_units])
        
        # 定義一個LSTM循環體，作爲循環的基礎結構
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
        _init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
        outputs,states = tf.nn.dynamic_rnn(lstm_cell, x_in, initial_state=_init_state, time_major=False)

        #hidden layer for output as the final results
        #results = tf.matmul(states[1], weights['out']) + biases['out']
        # or
        outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
        results = tf.matmul(outputs[-1], lstm1_weights_out) + lstm1_biases_out
        
    return results
    

#load mnist data
mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)

#define placeholder for input
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y_ = tf.placeholder(tf.float32, [None, n_classes])

y = inference(x)
cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y))


train_op = tf.train.AdamOptimizer(l_r).minimize(cost)

correct_pred = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))

#init session
sess = tf.Session()
#init all variables
sess.run(tf.global_variables_initializer())
#start training

#for i in range(training_iters):
for i in range(training_iters):
    #get batch to learn easily
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    batch_x = batch_x.reshape([batch_size, n_steps, n_inputs])
    sess.run(train_op,feed_dict={x: batch_x, y_: batch_y})
    if i % 50 == 0:
        print(sess.run(accuracy,feed_dict={x: batch_x, y_: batch_y,}))


#test_data = mnist.test.images.reshape([-1, n_steps, n_inputs])
#test_label = mnist.test.labels
#print("Testing Accuracy: ", sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

https://blog.csdn.net/qq_37879432/article/details/78552055
https://blog.csdn.net/xierhacker/article/details/73480744
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://arxiv.org/pdf/1506.00019v2.pdf
https://www.cnblogs.com/wangduo/p/6773601.html?utm_source=itdadao&utm_medium=referral
https://blog.csdn.net/notHeadache/article/details/81164264
https://www.leiphone.com/news/201709/QJAIUzp0LAgkF45J.html

Learning Tensorflow(5)---LSTM

LSTM網絡結構

細胞狀態

Tensorflow中構建LSTM

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

lightdb數據庫超時相關控制參數

lightdb秒級增加列和刪除列（not null帶默認值）

Java ThreadPoolShutdown

歸一化互相關

灰度共生矩陣

QT ：鼠標框選

QT ：菜單

SIFT特徵點檢測

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結