之前寫過使用 Python yield 實現的滑動窗口，因爲用TensorFlow比較多，並且 tf.data API 處理數據更加高效，對於大數據量的情況，選擇 API 實現滑動窗口相比原生的Python方法更好。本文介紹瞭如何使用 tensorflow 的 tf.data API 實現滑動窗口。

代碼環境：

Python 3.7.6 
TensorFlow 2.1.0

導入必要的包：

import tensorflow as tf

文章目錄

1. batch 實現單變量滑動窗口

2. window 實現單變量滑動窗口

在時間序列建模問題中，通常需要時間序列片段，並且的多數情況下是多個維度特徵的數據。因此，需要對原始的時間序列數據進行劃分，實現截取類似圖像的窗口數據，作爲樣本，構造樣本數據集，然後餵給神經網絡訓練。

先用一個簡單的例子演示所述問題：

1. batch 實現單變量滑動窗口

tf.data.batch 方法說明：

batch(batch_size, drop_remainder=False)

batch_size：tf.int64 標量，表示單個批次中元素的數量。
drop_remainder：（可選）tf.bool 標量，表示在 batch_size 不足批大小的情況下是否刪除該批次數據；默認不刪除較小的批次。

構造單變量虛擬數據：

range_ds = tf.data.Dataset.range(100000)

batch 實現無重疊，窗口寬度爲10的滑動窗口：

# 將數據生成batch_size=10的批數據。其中，drop_remainder 表示
# 在batch_size不足批大小的情況下是否刪除該批次數據；默認不刪除較小的批次。
batches = range_ds.batch(10, drop_remainder=True)

# 從批次數據中，取出五個批次並打印
for batch in batches.take(5):
    print(batch.numpy())

輸出：

[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]

1.1 無重疊採樣有偏移預測

def dense_1_step(batch):
    # 將單變量時間序列數據與預測標籤數據匹配
    # 此處將前9個採樣值作爲輸入，偏移一步的後9個採樣值作爲輸出
    return batch[:-1], batch[1:]

# map方法將所有批次數據實現數據與標籤的匹配
predict_dense_1_step = batches.map(dense_1_step) 

# 打印三個匹配好的樣本
for features, label in predict_dense_1_step.take(3):
    print(features.numpy(), " => ", label.numpy())

輸出：

[0 1 2 3 4 5 6 7 8]  =>  [1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18]  =>  [11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28]  =>  [21 22 23 24 25 26 27 28 29]

1.2 無重疊採樣無偏移預測

要預測整個窗口而不是固定的偏移量，，可以將批處理分爲兩部分：

batches = range_ds.batch(15, drop_remainder=True)

def label_next_5_steps(batch):
    return (batch[:-5],   # 一個批次內前十個採樣點作爲輸入
            batch[-5:])   # 一個批次內後五個採樣點作爲標籤

predict_5_steps = batches.map(label_next_5_steps)

for features, label in predict_5_steps.take(3):
    print(features.numpy(), " => ", label.numpy())

則輸出：

[0 1 2 3 4 5 6 7 8 9]  =>  [10 11 12 13 14]
[15 16 17 18 19 20 21 22 23 24]  =>  [25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]  =>  [40 41 42 43 44]

1.3 有重疊採樣無偏移預測

如果想讓樣本包含的採樣數據有重疊，可以使用 tf.data.Dataset.zip 實現：

feature_length = 10 # 窗口寬度
label_length = 5 # 預測輸出的長度

features = range_ds.batch(feature_length, drop_remainder=True)
# skip() 方法表示取一個批次之後的數據
# labels[:-5] 表示截取該批次的前五個採樣數據
labels = range_ds.batch(feature_length).skip(1).map(lambda labels: labels[:-5])

# zip 方法實現將樣本數據與樣本標籤匹配
predict_5_steps = tf.data.Dataset.zip((features, labels))

for features, label in predict_5_steps.take(3):
    print(features.numpy(), " => ", label.numpy())

輸出：

[0 1 2 3 4 5 6 7 8 9]  =>  [10 11 12 13 14]
[10 11 12 13 14 15 16 17 18 19]  =>  [20 21 22 23 24]
[20 21 22 23 24 25 26 27 28 29]  =>  [30 31 32 33 34]

如果將 skip(1) 改爲 skip(2) 則輸出：

[0 1 2 3 4 5 6 7 8 9]  =>  [20 21 22 23 24]
[10 11 12 13 14 15 16 17 18 19]  =>  [30 31 32 33 34]
[20 21 22 23 24 25 26 27 28 29]  =>  [40 41 42 43 44]

可以看到樣本數據與樣本標籤隔了一個批次。這樣做沒什麼實際意義，只是爲了方便理解 skip() 方法。

2. window 實現單變量滑動窗口

tf.data.window() 方法

window(size, shift=None, stride=1, drop_remainder=False)

參數說明：

size：表示拆分後每個窗口包含多少個採樣點，即窗口寬度。
shift：表示滑動窗口中輸入元素的跨度，即滑動步長。
stride：表示採樣點之間的跨度；可選參數，默認爲 None。

爲了方便理解該方法的用法，請看下例：

dataset = tf.data.Dataset.range(7).window(3, None, 1, True) 
for window in dataset: 
    print(list(window.as_numpy_iterator()))

輸出：

[0, 1, 2]
[3, 4, 5]

可以看到該示例是無重疊採樣，drop_remainder=True 表示丟棄不足窗口寬度的數據。

爲了增加可讀性，方便比較，僅保留關鍵代碼：

range(7).window(3, 1, 1, True) 
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
-------------------------------
range(7).window(3, 2, 1, True) 
[0, 1, 2]
[2, 3, 4]
[4, 5, 6]
-------------------------------
range(7).window(3, 3, 1, True)
[0, 1, 2]
[3, 4, 5]
-------------------------------
range(7).window(3, None, 1, True) 
[0, 1, 2]
[3, 4, 5]
-------------------------------
range(7).window(3, None, 2, True) 
[0, 2, 4]
-------------------------------
range(7).window(3, None, 3, True) 
[0, 3, 6]
-------------------------------
range(7).window(3, 1, 1, True) 
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
-------------------------------
range(7).window(3, 1, 2, True) 
[0, 2, 4]
[1, 3, 5]
[2, 4, 6]
-------------------------------
range(7).window(3, 1, 3, True)
[0, 3, 6]

Dataset.flat_map 方法可以獲取數據集的數據集並將其展平爲單個數據集：

window_size = 5
windows = range_ds.window(window_size, shift=1)

for x in windows.flat_map(lambda x: x).take(30):
    print(x.numpy(), end=' ')

輸出（爲了方便說明該方法的用法，警告信息就不粘過來了）：

0 1 2 3 4 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9

通過函數封裝：

def make_window_dataset(ds, window_size=5, shift=1, stride=1):
    windows = ds.window(window_size, shift=shift, stride=stride)

    def sub_to_batch(sub):
        return sub.batch(window_size, drop_remainder=True)

    windows = windows.flat_map(sub_to_batch)
    return windows

測試

ds = make_window_dataset(range_ds, window_size=10, shift=1, stride=2)

for example in ds.take(10):
    print(example.numpy())

輸出：

[ 0  2  4  6  8 10 12 14 16 18]
[ 1  3  5  7  9 11 13 15 17 19]
[ 2  4  6  8 10 12 14 16 18 20]
[ 3  5  7  9 11 13 15 17 19 21]
[ 4  6  8 10 12 14 16 18 20 22]
[ 5  7  9 11 13 15 17 19 21 23]
[ 6  8 10 12 14 16 18 20 22 24]
[ 7  9 11 13 15 17 19 21 23 25]
[ 8 10 12 14 16 18 20 22 24 26]
[ 9 11 13 15 17 19 21 23 25 27]

【tf.keras】12: TensorFlow 實現時間序列滑動窗口

文章目錄

1. batch 實現單變量滑動窗口

1.1 無重疊採樣有偏移預測

1.2 無重疊採樣無偏移預測

1.3 有重疊採樣無偏移預測

2. window 實現單變量滑動窗口

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（一）部署K8s

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

【CV12】如何在Keras使用 Mask R-CNN 進行目標檢測

【CV13】如何在Keras中使用 YOLO v3 進行目標檢測

【CV10】經典CNN模型中圖像數據增強方法簡介

【CV09】如何可視化CNN中的卷積核和特徵圖

【CV11】如何從頭開發於CIFAR-10圖像分類的CNN

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【tf.keras】12: TensorFlow 實現時間序列滑動窗口

文章目錄

1. batch 實現 單變量滑動窗口

1.1 無重疊採樣 有偏移預測

1.2 無重疊採樣 無偏移預測

1.3 有重疊採樣 無偏移預測

2. window 實現 單變量滑動窗口

1. batch 實現單變量滑動窗口

1.1 無重疊採樣有偏移預測

1.2 無重疊採樣無偏移預測

1.3 有重疊採樣無偏移預測

2. window 實現單變量滑動窗口