[深度學習TF2] 梯度帶(GradientTape)

1. 背景介紹

梯度帶是tensorflow2.x非常常用的一個特性了,因爲一旦涉及到計算梯度的問題就離不開這個新的API

2. tf.GradientTape函數的參數介紹

persistent: Boolean controlling whether a persistent gradient tape is created. False by default, which means at most one call can be made to the gradient() method on this object.
watch_accessed_variables: Boolean controlling whether the tape will automatically watch any (trainable) variables accessed while the tape is active. Defaults to True meaning gradients can be requested from any result computed in the tape derived from reading a trainable Variable. If False users must explicitly watch any Variables they want to request gradients from.

persistent: 如果是false,那麼gradient()函數最多隻能調用一次。反之可以調用多次,默認是False.
watch_accessed_variables: 默認值是True,可以自動對任何Tensorflow 的Variable求梯度。
如果是False,那麼只能顯示調用Watch()方法對某些變量就梯度了

例1 - persistent =False and watch_accessed_variables=True ,也就是默認值

import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape() as g:
    y = x * x
dy_dx = g.gradient(y, x)  # Will compute to 6.0
print(dy_dx)
dy_dx = g.gradient(y, x)
print(dy_dx)

執行結果調用第一次gradient()方法返回6,而第二次就拋錯,因爲persistent默認是False(GradientTape.gradient can only be called once on non-persistent tapes

tf.Tensor(6.0, shape=(), dtype=float32)
Traceback (most recent call last):
  File "**/GradientTape_test.py", line 70, in <module>
    test1()
  File "**/test/GradientTape_test.py", line 11, in test1
    dy_dx = g.gradient(y, x)  # Will compute to 6.0
  File "**\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 980, in gradient
    raise RuntimeError("GradientTape.gradient can only be called once on "
RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

例2 - persistent =True and watch_accessed_variables=True,

import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape(persistent=True) as g:
    y = x * x
dy_dx = g.gradient(y, x)  # Will compute to 6.0
print(dy_dx)
dy_dx = g.gradient(y, x)
print(dy_dx)

執行結果

tf.Tensor(6.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

例3 - persistent =True and watch_accessed_variables=True ,用Constant定義常量與Variable對比

import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape(persistent=True) as g:
    y = x * x
dy_dx = g.gradient(y, x)  # Will compute to 6.0
print(dy_dx)

x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g1:
    y = x * x
dy_dx = g1.gradient(y, x)  # Will compute to 6.0
print(dy_dx)

with tf.GradientTape(persistent=True) as g1:
    g1.watch(x)
    y = x * x
dy_dx = g1.gradient(y, x)  # Will compute to 6.0
print(dy_dx)

執行結果,如果用constant定義常量而且你想要對其求梯度,那麼必須調用watch方法

tf.Tensor(6.0, shape=(), dtype=float32)
None
tf.Tensor(6.0, shape=(), dtype=float32)

例4:利用梯度的值再求梯度

import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape() as g:
    g.watch(x)
    with tf.GradientTape() as gg:
        gg.watch(x)
        y = x * x
    dy_dx = gg.gradient(y, x)  # Will compute to 6.0
    print(dy_dx)
d2y_dx2 = g.gradient(dy_dx, x)  # Will compute to 2.
print(d2y_dx2)

例5: 對同在一個梯度帶中的多個公式分別就梯度

import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y = x * x
    z = y * y
dz_dx = g.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
print(dz_dx)
dy_dx = g.gradient(y, x)  # 6.0
print(dy_dx)
del g  # Drop the reference to the tape

例6 ,可以對tf 相關的函數求梯度,例如reduce_sum與multiply

import tensorflow as tf
x = tf.ones((2, 2))
print(x)
y = tf.reduce_sum(x)
print(y)
z = tf.multiply(y, y)
print(z)
# 需要計算梯度的操作
with tf.GradientTape() as t:
    t.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y, y)
# 計算z關於x的梯度
dz_dx = t.gradient(z, x)
print(dz_dx)

例7, 二元函數求梯度

import tensorflow as tf
x = tf.constant(value=3.0)
y = tf.constant(value=2.0)
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
    tape.watch([x,y])
    z1=x*x*y+x*y
# 一階導數
dz1_dx=tape.gradient(target=z1,sources=x)
dz1_dy = tape.gradient(target=z1, sources=y)
dz1_d=tape.gradient(target=z1,sources=[x,y])
print("dz1_dx:", dz1_dx)
print("dz1_dy:", dz1_dy)
print("dz1_d:",dz1_d)
print("type of dz1_d:",type(dz1_d))

執行結果

dz1_dx: tf.Tensor(14.0, shape=(), dtype=float32)
dz1_dy: tf.Tensor(12.0, shape=(), dtype=float32)
dz1_d: [<tf.Tensor: shape=(), dtype=float32, numpy=14.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>]
type of dz1_d: <class 'list'>

3. apply_gradients(grads_and_vars,name=None)

作用:把計算出來的梯度更新到變量上面去。
參數:

grads_and_vars: (gradient, variable) 對的列表.
name: 操作名
This is the second part of minimize(). It returns an Operation that
applies gradients.
Args:
grads_and_vars: List of (gradient, variable) pairs.
name: Optional name for the returned operation. Default to the name
passed to the Optimizer constructor.
Returns:
An Operation that applies the specified gradients. The iterations
will be automatically increased by 1.

例8, 一個線性迴歸的簡單綜合例子來把優化器和梯度帶結合起來

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

TRAIN_STEPS = 20

# Prepare train data
train_X = np.linspace(-1, 1, 100)
train_Y = 2 * train_X + np.random.randn(*train_X.shape) * 0.33 + 10

print(train_X.shape)

w = tf.Variable(initial_value=1.0)
b = tf.Variable(initial_value=1.0)

optimizer = tf.keras.optimizers.SGD(0.1)
mse = tf.keras.losses.MeanSquaredError()

for i in range(TRAIN_STEPS):
    print("epoch:", i)
    print("w:", w.numpy())
    print("b:", b.numpy())
    # 計算和更新梯度
    with tf.GradientTape() as tape:
        logit = w * train_X + b
        loss = mse(train_Y, logit)
    gradients = tape.gradient(target=loss, sources=[w, b])  # 計算梯度
    # print("gradients:",gradients)
    # print("zip:\n",list(zip(gradients,[w,b])))
    optimizer.apply_gradients(zip(gradients, [w, b]))  # 更新梯度

# draw
plt.plot(train_X, train_Y, "+")
plt.plot(train_X, w * train_X + b)
plt.show()

執行結果: 可以看到隨着epoch增大,W和b值逐漸逼近2和10

epoch: 0
w: 1.0
b: 1.0
epoch: 1
w: 1.0676092
b: 2.7953496
epoch: 2
w: 1.13062
b: 4.231629
epoch: 3
w: 1.1893452
b: 5.3806524
epoch: 4
w: 1.2440765
b: 6.2998714
epoch: 5
w: 1.2950852
b: 7.035247
epoch: 6
w: 1.3426247
b: 7.623547
epoch: 7
w: 1.3869308
b: 8.094187
epoch: 8
w: 1.4282235
b: 8.470699
epoch: 9
w: 1.4667077
b: 8.771909
epoch: 10
w: 1.5025746
b: 9.0128765
epoch: 11
w: 1.5360019
b: 9.20565
epoch: 12
w: 1.5671558
b: 9.35987
epoch: 13
w: 1.5961908
b: 9.483246
epoch: 14
w: 1.6232511
b: 9.581946
epoch: 15
w: 1.6484709
b: 9.660907
epoch: 16
w: 1.6719754
b: 9.724075
epoch: 17
w: 1.6938813
b: 9.77461
epoch: 18
w: 1.7142972
b: 9.815037
epoch: 19
w: 1.7333245
b: 9.847379

在這裏插入圖片描述

例9 -記錄控制流

因爲tapes記錄了整個操作,所以即使過程中存在python控制流(如if, while),梯度求導也能正常處理。

def f(x, y):
    output = 1.0
    # 根據y的循環
    for i in range(y):
        # 根據每一項進行判斷
        if i> 1 and i<5:
            output = tf.multiply(output, x)
    return output

def grad(x, y):
    with tf.GradientTape() as t:
        t.watch(x)
        out = f(x, y)
        # 返回梯度
        return t.gradient(out, x)
# x爲固定值
x = tf.convert_to_tensor(2.0)

print(grad(x, 6))
print(grad(x, 5))
print(grad(x, 4))

執行結果

tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

4.參考資料

[0] https://www.tensorflow.org/api_docs/python/tf/GradientTape
[1] https://blog.csdn.net/xierhacker/article/details/53174558
[2] https://blog.csdn.net/qq_36758914/article/details/104456736

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章