使用tensorflow預測時間序列：TFTS庫

Tensorflow1.3版本中引入tensorflow time series模塊，簡稱TFTS，專門設計一套針對時間序列預測問題的API，提供AR、anomaly mixture AR和LSTM三種預測模型

#項目地址
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/timeseries
#代碼地址
https://github.com/hzy46/TensorFlow-Time-Series-Examples

1.時間序列數據讀入

TFTS庫提供了兩個讀取器NumpyReader(從numpy數組中讀入數據)和CSVReader(從csv文件中讀取數據)

1.從numpy數組中讀入時間序列數據

#!/anaconda3/bin/python
# -*- coding: utf-8 -*-

from __future__ import print_function
import numpy as np
import matplotlib
import os
matplotlib.use('agg')
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.contrib.timeseries.python.timeseries import NumpyReader


print(os.getcwd())
x = np.array(range(1000))
noise = np.random.uniform(-0.2,0.2,1000)
y=np.sin(np.pi*x/100)+x/200+noise
plt.plot(x,y)
plt.savefig('gg.jpg')

TFTS讀入x和y的方式：

data = {tf.contrib.timeseries.TrainEvalFeatures.TIMES: x, tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,}
reader = NumpyReader(data)

首先將x和y變成python中的字典(變量data)，變量data中的鍵實際就是一個字符串"times"，值就是字符串"values"。故上面定義也可直接寫成data={'times':x , 'values':y}，寫成較複雜形式是爲了與源碼中寫法保持一致

得到的reader有個read_full()方法，其返回值即時間序列對應的tensor

with tf.Session() as sess:
    full_data = reader.read_full()
    #調用read_full方法會生成讀取序列
    #要用tf.train.start_queue_runners啓動隊列才能正常進行讀取獲取值
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess,coord=coord)
    print(sess.run(full_data))
    coord.request_stop()

不能直接用sess.run(reader.read_full())從reader中讀取所有數據，原因是方法read_full()會產生讀取序列，而隊列線程此時還未啓動

訓練時通常不會使用整個數據集進行訓練，而是採用batch的形式，從reader出發建立batch數據的方法：

train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(reader,batch_size=2,window_size=10)

tf.contrib.timeseries.RandomWindowInputFn會在reader所有數據中隨機選取窗口長度爲window_size的序列，幷包裝成batch_size大小的batch數據。即一個batch共有batch_size個序列，每個序列長度爲window_size

打印出一個batch內的數據：

with tf.Session() as sess:
    batch_data = train_input_fn.create_batch()
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess,coord=coord)
    one_batch = sess.run(batch_data[0])
    coord.request_stop()
print(one_batch)

2.AR模型

自迴歸模型簡稱AR模型，是統計學上處理時間序列模型的基本方法之一，使用AR模型訓練、驗證並進行時間序列預測的示例程序如下，其中定義的AR模型每次接受長度=30的輸入觀測值並輸出長度=10的預測序列，整個訓練集是長度=1000的序列，前30個數首先被當作“初始觀測序列”輸入到模型中，由此可計算下面10步的預測值，接着再取30個數進行預測，這30個數中有10個數是前一步的預測值，新得到的預測值又會變成下一步的輸入，以此類推

最終得到970個預測值，970個預測值就被記錄在evaluation['mean']中。其他鍵值，如evaluation['loss']表示總損失，evaluation['times']表示evaluation['mean']對應的時間點等

evaluation['start_tuple']會被用於之後預測中，相當於最後30步的輸出值和對應時間點，以此爲起點可對1000步之後的值進行預測

#!/anaconda3/bin/python
# -*- coding: utf-8 -*-

from __future__ import print_function
import numpy as np
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.contrib.timeseries.python.timeseries import  NumpyReader


def main(_):
    x = np.array(range(1000))
    noise = np.random.uniform(-0.2, 0.2, 1000)
    y = np.sin(np.pi * x / 100) + x / 200. + noise
    plt.plot(x, y)
    plt.savefig('timeseries_y.jpg')

    data = {
        tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
        tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
    }

    reader = NumpyReader(data)

    train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(
        reader, batch_size=16, window_size=40)

    #periodicities表示序列規律性週期，根據np.sin函數判定週期=200
    #input_window_size表示模型每次輸入的值
    #output_window_size表示模型每次輸出的值，output_window_size+input_window_size=window_size
    #前30個數當作模型輸入值，後10個數爲輸入對應的目標輸出值
    #參數loss指定採取哪一種損失，兩種選擇NORMAL_LIKELIHOOD_LOSS、SQUARED_LOSS
    #num_features表示在一個時間點上觀測到的數的維度
    ar = tf.contrib.timeseries.ARRegressor(
        periodicities=200, input_window_size=30, output_window_size=10,
        num_features=1,
        loss=tf.contrib.timeseries.ARModel.NORMAL_LIKELIHOOD_LOSS)

    ar.train(input_fn=train_input_fn, steps=6000)

    #TFTS中驗證含義：使用訓練好的模型在原先的訓練集上進行計算，可觀察到模型擬合效果
    evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)
    # keys of evaluation: ['covariance', 'loss', 'mean', 'observed', 'start_tuple', 'times', 'global_step']
    evaluation = ar.evaluate(input_fn=evaluation_input_fn, steps=1)

    #預測後250步的值
    (predictions,) = tuple(ar.predict(
        input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
            evaluation, steps=250)))

    plt.figure(figsize=(15, 5))
    plt.plot(data['times'].reshape(-1), data['values'].reshape(-1), label='origin')
    plt.plot(evaluation['times'].reshape(-1), evaluation['mean'].reshape(-1), label='evaluation')
    plt.plot(predictions['times'].reshape(-1), predictions['mean'].reshape(-1), label='prediction')
    plt.xlabel('time_step')
    plt.ylabel('values')
    plt.legend(loc=4)
    plt.savefig('predict_result.jpg')


if __name__ == '__main__':
    tf.logging.set_verbosity(tf.logging.INFO)
    tf.app.run()

3.LSTM模型（單變量+多變量）

#!/anaconda3/bin/python
# -*- coding: utf-8 -*-

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from os import path

import numpy as np
import tensorflow as tf

from tensorflow.contrib.timeseries.python.timeseries import estimators as ts_estimators
from tensorflow.contrib.timeseries.python.timeseries import model as ts_model
from tensorflow.contrib.timeseries.python.timeseries import  NumpyReader

import matplotlib
matplotlib.use("agg")
import matplotlib.pyplot as plt


class _LSTMModel(ts_model.SequentialTimeSeriesModel):
  """A time series model-building example using an RNNCell."""

  def __init__(self, num_units, num_features, dtype=tf.float32):
    """Initialize/configure the model object.
    Note that we do not start graph building here. Rather, this object is a
    configurable factory for TensorFlow graphs which are run by an Estimator.
    Args:
      num_units: The number of units in the model's LSTMCell.
      num_features: The dimensionality of the time series (features per
        timestep).
      dtype: The floating point data type to use.
    """
    super(_LSTMModel, self).__init__(
        # Pre-register the metrics we'll be outputting (just a mean here).
        train_output_names=["mean"],
        predict_output_names=["mean"],
        num_features=num_features,
        dtype=dtype)
    self._num_units = num_units
    # Filled in by initialize_graph()
    self._lstm_cell = None
    self._lstm_cell_run = None
    self._predict_from_lstm_output = None

  def initialize_graph(self, input_statistics):
    """Save templates for components, which can then be used repeatedly.
    This method is called every time a new graph is created. It's safe to start
    adding ops to the current default graph here, but the graph should be
    constructed from scratch.
    Args:
      input_statistics: A math_utils.InputStatistics object.
    """
    super(_LSTMModel, self).initialize_graph(input_statistics=input_statistics)
    self._lstm_cell = tf.nn.rnn_cell.LSTMCell(num_units=self._num_units)
    # Create templates so we don't have to worry about variable reuse.
    self._lstm_cell_run = tf.make_template(
        name_="lstm_cell",
        func_=self._lstm_cell,
        create_scope_now_=True)
    # Transforms LSTM output into mean predictions.
    self._predict_from_lstm_output = tf.make_template(
        name_="predict_from_lstm_output",
        func_=lambda inputs: tf.layers.dense(inputs=inputs, units=self.num_features),
        create_scope_now_=True)

  def get_start_state(self):
    """Return initial state for the time series model."""
    return (
        # Keeps track of the time associated with this state for error checking.
        tf.zeros([], dtype=tf.int64),
        # The previous observation or prediction.
        tf.zeros([self.num_features], dtype=self.dtype),
        # The state of the RNNCell (batch dimension removed since this parent
        # class will broadcast).
        [tf.squeeze(state_element, axis=0)
         for state_element
         in self._lstm_cell.zero_state(batch_size=1, dtype=self.dtype)])

  def _transform(self, data):
    """Normalize data based on input statistics to encourage stable training."""
    mean, variance = self._input_statistics.overall_feature_moments
    return (data - mean) / variance

  def _de_transform(self, data):
    """Transform data back to the input scale."""
    mean, variance = self._input_statistics.overall_feature_moments
    return data * variance + mean

  def _filtering_step(self, current_times, current_values, state, predictions):
    """Update model state based on observations.
    Note that we don't do much here aside from computing a loss. In this case
    it's easier to update the RNN state in _prediction_step, since that covers
    running the RNN both on observations (from this method) and our own
    predictions. This distinction can be important for probabilistic models,
    where repeatedly predicting without filtering should lead to low-confidence
    predictions.
    Args:
      current_times: A [batch size] integer Tensor.
      current_values: A [batch size, self.num_features] floating point Tensor
        with new observations.
      state: The model's state tuple.
      predictions: The output of the previous `_prediction_step`.
    Returns:
      A tuple of new state and a predictions dictionary updated to include a
      loss (note that we could also return other measures of goodness of fit,
      although only "loss" will be optimized).
    """
    state_from_time, prediction, lstm_state = state
    with tf.control_dependencies(
            [tf.assert_equal(current_times, state_from_time)]):
      transformed_values = self._transform(current_values)
      # Use mean squared error across features for the loss.
      predictions["loss"] = tf.reduce_mean(
          (prediction - transformed_values) ** 2, axis=-1)
      # Keep track of the new observation in model state. It won't be run
      # through the LSTM until the next _imputation_step.
      new_state_tuple = (current_times, transformed_values, lstm_state)
    return (new_state_tuple, predictions)

  def _prediction_step(self, current_times, state):
    """Advance the RNN state using a previous observation or prediction."""
    _, previous_observation_or_prediction, lstm_state = state
    lstm_output, new_lstm_state = self._lstm_cell_run(
        inputs=previous_observation_or_prediction, state=lstm_state)
    next_prediction = self._predict_from_lstm_output(lstm_output)
    new_state_tuple = (current_times, next_prediction, new_lstm_state)
    return new_state_tuple, {"mean": self._de_transform(next_prediction)}

  def _imputation_step(self, current_times, state):
    """Advance model state across a gap."""
    # Does not do anything special if we're jumping across a gap. More advanced
    # models, especially probabilistic ones, would want a special case that
    # depends on the gap size.
    return state

  def _exogenous_input_step(
          self, current_times, current_exogenous_regressors, state):
    """Update model state based on exogenous regressors."""
    raise NotImplementedError(
        "Exogenous inputs are not implemented for this example.")


if __name__ == '__main__':
  tf.logging.set_verbosity(tf.logging.INFO)
  #y對x的函數關係複雜，更適合用LSTM這樣的模型找出其中規律
  x = np.array(range(1000))
  noise = np.random.uniform(-0.2, 0.2, 1000)
  y = np.sin(np.pi * x / 50 ) + np.cos(np.pi * x / 50) + np.sin(np.pi * x / 25) + noise

  data = {
      tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
      tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
  }

  reader = NumpyReader(data)

  #4個隨機選取的序列，每個序列長度=100
  train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(
      reader, batch_size=4, window_size=100)

  #num_features表示每個時間點上觀察的量只是單獨的一個數值
  #num_units表示使用隱層=128大小的LSTM模型
  estimator = ts_estimators.TimeSeriesRegressor(
      model=_LSTMModel(num_features=1, num_units=128),
      optimizer=tf.train.AdamOptimizer(0.001))

  estimator.train(input_fn=train_input_fn, steps=2000)
  evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)
  evaluation = estimator.evaluate(input_fn=evaluation_input_fn, steps=1)
  # Predict starting after the evaluation
  (predictions,) = tuple(estimator.predict(
      input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
          evaluation, steps=200)))

  observed_times = evaluation["times"][0]
  observed = evaluation["observed"][0, :, :]
  evaluated_times = evaluation["times"][0]
  evaluated = evaluation["mean"][0]
  predicted_times = predictions['times']
  predicted = predictions["mean"]

  plt.figure(figsize=(15, 5))
  plt.axvline(999, linestyle="dotted", linewidth=4, color='r')
  observed_lines = plt.plot(observed_times, observed, label="observation", color="k")
  evaluated_lines = plt.plot(evaluated_times, evaluated, label="evaluation", color="g")
  predicted_lines = plt.plot(predicted_times, predicted, label="prediction", color="r")
  plt.legend(handles=[observed_lines[0], evaluated_lines[0], predicted_lines[0]],
             loc="upper left")
  plt.savefig('predict_result.jpg')

使用tensorflow預測時間序列：TFTS庫

1.時間序列數據讀入

2.AR模型

3.LSTM模型（單變量+多變量）

模型評估方法-K-S值-附R實現代碼

HIVE中join連接全解析

使用tensorflow預測時間序列：TFTS庫

windows10中使用jupyter lab

方差分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結