tensorflow RNN實例

本實例基於谷歌tensorflow官網RNN tutorial，Basic LSTM，側重代碼分析，包括數據預處理。

##read.py
###_read_words函數

讀取ptb文件，按utf-8格式讀入，換行符使用替換，讀取到的將組成list,可以通過如下的命令行模式下進行測試。

with tf.gfile.GFile("/home/gsc/envtensorflow/deep_learn/models/tutorials/simple-examples/data/ptb.train.txt", "r") as f:
  data =f.read().decode("utf-8").replace("\n", "<eos>").split()

圖ptb_1

###_build_vocab函數
統計每個單詞出現的次數，counter是字典格式，key是單詞，value是該單詞出現的次數

counter = collections.Counter(data)

得到一個list，list的每個元素是一個元組，list中單詞出現的次數是降序排序過的。如圖ptb_2

count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

圖ptb_2
將單詞和出現的次數分開存放在words和_中。_這種表示一般就是不用。

words, _ = list(zip(*count_pairs))

將單詞和其順序編碼後，以字典的形式存在word_to_id中，順序從0開始，見ptb_3

word_to_id = dict(zip(words, range(len(words))))

圖ptb_3
最後返回單詞序列和其對應的值如{'the'， 0}, {'<unk>',1}, ..., {'federal', 100}

train_data = _file_to_word_ids(train_path, word_to_id)

這裏把這個函數展開成如下：

train_data = []
for word in data:
  if word in word_to_id:
    train_data.append(word_to_id[word])

train_data 存放的就是每一個每一個單詞對應在word_to_id的索引值，比如aer在word_to_id中的索引值是9970，
則train_data的第一個元素就是train_data[0]=9970…
獲得總的單詞的個數

vocabulary = len(word_to_id)

###ptb_producer函數：
參數raw_data：train_data, batch_size:20, num_steps:20,以small方式進行分析。
將numpy的array轉換成tensorflow需要的tensor，見ptb_5

ptb_5

raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

把原始一位數組的數據，轉換成20行，batch_len的列，batch_len 是數據總長度除以batch_size的值。

    data = tf.reshape(raw_data[0 : batch_size * batch_len],
                      [batch_size, batch_len])

這裏創建的i類似於c語言中中的for循環中i的作用。shuffle表示不要重拍，i的值就是從0-epoch_size-1.

i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()

將data數據進行了維度切分，分成了[batch_size-0, (i+1)*num_steps - i *num_steps]= [20, 20],實際上是對data數據按照20列的次序進行切割。

    x = tf.strided_slice(data, [0, i * num_steps],
                         [batch_size, (i + 1) * num_steps])

y和x基本上是一樣的，即相當於是y[n]=x[n-1]，即把對應的元素後移一個單位，這是可以理解的，x做爲訓練輸入數據，y做爲label，label的標準就是判斷其下一個輸出。

##ptb_word_lm.py
###獲得配置
根據

config = get_config()

根據配置的模式，獲取配置參數，這裏假設Small模式

|  """Small config."""
|  init_scale = 0.1
|  learning_rate = 1.0
|  max_grad_norm = 5
|  num_layers = 2
|  num_steps = 20
|  hidden_size = 200
|  max_epoch = 4
|  max_max_epoch = 13
|  keep_prob = 1.0
|  lr_decay = 0.5
|  batch_size = 20
|  vocab_size = 10000

PTBInput類：
self.input_data
self.targets
將都是[20, 20]的tensor。見ptb_6

圖ptb_6

train_input = PTBInput(config=config, data=train_data, name="TrainInput")

train_input就是一個class的實例，後面在使用使，需要使用諸如：

train_input.targets
train_input.input_data

###class PTBModel(object):訓練模型核心
####basic Cell
這是繼數據預處理之後的另一個核心模塊。
從2017年3月17號起（tensorflow1.0之前），tf.contrib.rnn.BasicLSTMCell的參數中，並沒有reuse參數。這裏是兼容新舊兩種版本。

      if 'reuse' in inspect.getargspec(
          tf.contrib.rnn.BasicLSTMCell.__init__).args:
        return tf.contrib.rnn.BasicLSTMCell(
            size, forget_bias=0.0, state_is_tuple=True,
            reuse=tf.get_variable_scope().reuse)
      else:
        return tf.contrib.rnn.BasicLSTMCell(
            size, forget_bias=0.0, state_is_tuple=True)

創建的BasicLSTMCell放在了attn_cell裏，或者說attn_cell是一個實例。

attn_cell = lstm_cell

這裏實現的LSTM，是最基本的LSTM，其論文在http://arxiv.org/abs/1409.2329，這裏直接粘貼公式：

圖ptb_8

圖ptb_9 有dropout
看結構：

圖ptb_10
爲了便於看ptb_10中的tensor和維度關係，這裏需要對公式重新按上圖羅列一下（和前一篇有些重複了，參看http://blog.csdn.net/shichaog/article/details/72853665 ）：

$h_t^j=o_t^j\odot \tanh(c_t^j)$
$c_t^j=f_t^j\odot c_{t-1}^j+i_t^j\odot j_t^j$
$o_t^j=\sigma(W_{xo}X_t+W_{ho}h_{t-1}+b_o)^j$
$f_t^j=\tanh(W_{xf}x_t+W_{hf}h_{t-1}+b_f)^j$
$j_t^j=\sigma(W_{xj}x_t+W_{hj}h_{t-1}+b_j)^j$
$i_t^j=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i)^j$
這裏我要對上面的公式進行重組一下。把權重和輸入組合成一個大矩陣。
$[x_t h_(t-1)][W_{xi} W_{xj} W_{xf} W{xo} W_{hi} W_{hj}W_{hf}W_{ho}]$
將輸入也進行重組
那麼就有如下的重組計算公式：
$[i_t^{'} j_t^{'} f_t^{'} o_t^{'}]=[X_{t(2*200)}h_{t-1(2*200)}]\odot[W_{xi(400*100)} W_{xj(400*100)} W_{xf(400*100)} W_{xo(400*100)}W_{hi(400*100)} W_{hj(400*100)}W_{hf(400*100)}W_{ho(400*100)}]$
上面的推導，就是basic_lstm_cell_1中的數據維度的關係。相乘後矩陣在進行split，得到20*200的維度矩陣，再分別basic_lstm_cell中做 $\sigma \tanh$ 操作，這些計算就是basic LSTM中給定的操作。
將tensorboard打開後，可以看到如下的具體細節：

圖ptb_11
split的上一層各node的連線從左到右一次對應於 $f_t, i_t,j_t,o_t$ 上的連線。看圖ptb_10， $i_t和j_t$ 相乘得到 $mul_1$ 節點， $mul$ 是 $f_t和c_{t-1}$ 節點的乘積。後面把 $c_t和h_t$ 這兩個tensor傳遞給mlti_rnn_cell_1，把 $h_t和輸入x_t$ 傳遞給basic_lstm_cell_1.
總結來說就是把multi_rnn_cell的cell_0的 $c_t和h_t$ 傳遞給multi_rnn_cell_1的cell_0的 $c_t和h_t$ ;把multi_rnn_cell的cell_1的 $c_t和h_t$ 傳遞給multi_rnn_cell_1的cell_1的 $c_t和h_t$ ;
由於config.num_layers的值等於2，所以創建了循環兩次。

    cell = tf.contrib.rnn.MultiRNNCell(
        [attn_cell() for _ in range(config.num_layers)], state_is_tuple=True)
    self._initial_state = cell.zero_state(batch_size, data_type())
得到的結構如下ptb_12

ptb_12

    with tf.device("/cpu:0"):
      embedding = tf.get_variable(
          "embedding", [vocab_size, size], dtype=data_type())
      inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

見ptb_13&ptb_14

ptb_13

ptb_14
####RNN堆疊
接下來創建了一個RNN的變量空間。

    with tf.variable_scope("RNN"):
      for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)

ptb_15
這裏根據time_step的值，共進行了20次，這裏ptb_15只截屏到了幾個，爲了讓細節看的更清楚，每個multi_rnn_cell之間都有四個tensor，這四個tensor分別是cell0和cell1的 $c_t和h_t$ 。最後總共輸出20個tensor。每一個tensor都是20200維度的。

圖ptb_16
然後經過stack和reshape操作，得到400200的矩陣。

	output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])

接下來定義了權重和bias

    softmax_w = tf.get_variable(
        "softmax_w", [size, vocab_size], dtype=data_type())
    softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())

ptb_17
它們的維度如上圖。
至此，可以看看embedding,RNN，w和b的關係。

ptb_18
###損失函數

    logits = tf.matmul(output, softmax_w) + softmax_b
    loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example(
        [logits],
        [tf.reshape(input_.targets, [-1])],
        [tf.ones([batch_size * num_steps], dtype=data_type())])
    self._cost = cost = tf.reduce_sum(loss) / batch_size
    self._final_state = state

seq2seq模型，這裏不解釋，放到seq2seq。
####learning rate跟新

    self._lr = tf.Variable(0.0, trainable=False)
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
                                      config.max_grad_norm)

爲了處理gradient explosion和gradient vanishing，使用clip方式，將梯度限制在合理範圍。

##訓練過程

for i in range(config.max_max_epoch):

根據重複13（max_max_epoch）次遍歷所有訓練數據。
初始化模型的學習率。

lr_decay = config.lr_decay ** max(i + 1 - config.max_epoch, 0.0)
        m.assign_lr(session, config.learning_rate * lr_decay)

        print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))
        train_perplexity = run_epoch(session, m, eval_op=m.train_op,
                                     verbose=True)

###run_epoch
這個函數首先初始化模型的初始化狀態。

  state = session.run(model.initial_state)

然後將模型的cost和state存到fetches字典裏。

fetches={                                                                   return self._input                                                     
	"cost": model.cost,                                                                                                                    
        "final_state": model.final_state,                               @property                                                                
  }  
  if eval_op is not None:

訓練過程

//epoch_size==13 ,所以這裏執行了13次
  for step in range(model.input.epoch_size):
    feed_dict = {}
    //每一次都獲取LSTM的狀態，$c_t$和$h_t$,並把新的狀態放到填充字典中。enumerate是內置遍歷函數，i變成從0開始增加的非負整數，(c,h)是state組成的元組
    for i, (c, h) in enumerate(model.initial_state):
      feed_dict[c] = state[i].c
      feed_dict[h] = state[i].h
//啓動計算圖，獲得$model.cost$和$model.final_state$節點計算值
    vals = session.run(fetches, feed_dict)
這裏取出cost，和state是爲了計算perplexity值，perplexity值可以看成是備選詞的數量，所以該值越小越好。
    cost = vals["cost"]
    state = vals["final_state"]

    costs += cost
    iters += model.input.num_steps

##付錄，BasicLSTM實現源碼
這個不難，看ptb_10和ptb_11就可以明白，這裏不具體分析代碼

class BasicLSTMCell(RNNCell):
  """Basic LSTM recurrent network cell.
  The implementation is based on: http://arxiv.org/abs/1409.2329.
  We add forget_bias (default: 1) to the biases of the forget gate in order to
  reduce the scale of forgetting in the beginning of the training.
  It does not allow cell clipping, a projection layer, and does not
  use peep-hole connections: it is the basic baseline.
  For advanced models, please use the full LSTMCell that follows.
  """

  def __init__(self, num_units, forget_bias=1.0, input_size=None,
               state_is_tuple=True, activation=tanh, reuse=None):
    """Initialize the basic LSTM cell.
    Args:
      num_units: int, The number of units in the LSTM cell.
      forget_bias: float, The bias added to forget gates (see above).
      input_size: Deprecated and unused.
      state_is_tuple: If True, accepted and returned states are 2-tuples of
        the `c_state` and `m_state`.  If False, they are concatenated
        along the column axis.  The latter behavior will soon be deprecated.
      activation: Activation function of the inner states.
      reuse: (optional) Python boolean describing whether to reuse variables
        in an existing scope.  If not `True`, and the existing scope already has
        the given variables, an error is raised.
    """
    if not state_is_tuple:
      logging.warn("%s: Using a concatenated state is slower and will soon be "
                   "deprecated.  Use state_is_tuple=True.", self)
    if input_size is not None:
      logging.warn("%s: The input_size parameter is deprecated.", self)
    self._num_units = num_units
    self._forget_bias = forget_bias
    self._state_is_tuple = state_is_tuple
    self._activation = activation
    self._reuse = reuse

  @property
  def state_size(self):
    return (LSTMStateTuple(self._num_units, self._num_units)
            if self._state_is_tuple else 2 * self._num_units)

  @property
  def output_size(self):
    return self._num_units

  def __call__(self, inputs, state, scope=None):
    """Long short-term memory cell (LSTM)."""
    with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse):
      # Parameters of gates are concatenated into one multiply for efficiency.
      if self._state_is_tuple:
        c, h = state
      else:
        c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)
      concat = _linear([inputs, h], 4 * self._num_units, True)

      # i = input_gate, j = new_input, f = forget_gate, o = output_gate
      i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)

      new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *
               self._activation(j))
      new_h = self._activation(new_c) * sigmoid(o)

      if self._state_is_tuple:
        new_state = LSTMStateTuple(new_c, new_h)
      else:
        new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state

tensorflow RNN實例

lightdb hash index的性能和限制

tensorflow RNN實例

tensorflow 移植到android平臺

kaldi 在線中文識別系統搭建

深度學習回聲消除 AEC

tensorflow android 喚醒詞識別

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結