Tensorflow2.0雖說簡單易用，但是在日常使用過程中，仍有許多細節需要注意。那麼，本文主要轉載自　https://www.zybuluo.com/Team/note/1479565，對tensorflow2.0重要細節進行補充和說明
幾乎全文高能，值得認真一度：

TensorFlow雖是深度學習領域最廣泛使用的框架，但是對比PyTorch這一動態圖框架，採用靜態圖（Graph模式）的TensorFlow確實是難用。好在最近TensorFlow支持了eager模式，對標PyTorch的動態執行機制。更進一步地，Google在最近推出了全新的版本TensorFlow 2.0，2.0版本相比1.0版本不是簡單地更新，而是一次重大升級（雖然目前只發布了preview版本）。簡單地來說，TensorFlow 2.0默認採用eager執行模式，而且重整了很多混亂的模塊。毫無疑問，2.0版本將會逐漸替換1.0版本，所以很有必要趁早入手TensorFlow 2.0。這篇文章將簡明扼要地介紹TensorFlow 2.0，以求快速入門。

Eager執行

TensorFlow的Eager執行時一種命令式編程（imperative programming），這和原生Python是一致的，當你執行某個操作時是立即返回結果的。而TensorFlow一直是採用Graph模式，即先構建一個計算圖，然後需要開啓Session，喂進實際的數據才真正執行得到結果。顯然，eager執行更簡潔，我們可以更容易debug自己的代碼，這也是爲什麼PyTorch更簡單好用的原因。一個簡單的例子如下：

x = tf.ones((2, 2), dtype=tf.dtypes.float32)
y = tf.constant([[1, 2],
                 [3, 4]], dtype=tf.dtypes.float32)
z = tf.matmul(x, y)
print(z)
# tf.Tensor(
# [[4. 6.]
#  [4. 6.]], shape=(2, 2), dtype=float32)
print(z.numpy())
# [[4. 6.]
# [4. 6.]]

可以看到在eager執行下，每個操作後的返回值是tf.Tensor，其包含具體值，不再像Graph模式下那樣只是一個計算圖節點的符號句柄。由於可以立即看到結果，這非常有助於程序debug。更進一步地，調用tf.Tensor.numpy()方法可以獲得Tensor所對應的numpy數組。

這種eager執行的另外一個好處是可以使用Python原生功能(但並不是所有情況都可以，有一定的限制，詳細可參考: AutoGraph Capabilities and Limitations)，比如下面的條件判斷：

random_value = tf.random.uniform([], 0, 1)
x = tf.reshape(tf.range(0, 4), [2, 2])
print(random_value)
if random_value.numpy() > 0.5:
    y = tf.matmul(x, x)
else:
    y = tf.add(x, x)

這種動態控制流主要得益於eager執行得到Tensor可以取出numpy值，這避免了使用Graph模式下的tf.cond和tf.while等算子。

另外一個重要的問題，在egaer模式下如何計算梯度。在Graph模式時，我們在構建模型前向圖時，同時也會構建梯度圖，這樣實際喂數據執行時可以很方便計算梯度。但是eager執行是動態的，這就需要每一次執行都要記錄這些操作以計算梯度，這是通過tf.GradientTape來追蹤所執行的操作以計算梯度，下面是一個計算實例：

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w + 2. * w + 5.
grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 4.]], shape=(1, 1), dtype=float32)

對於eager執行，每個tape會記錄當前所執行的操作，這個tape只對當前計算有效，並計算相應的梯度。PyTorch也是動態圖模式，但是與TensorFlow不同，它是每個需要計算Tensor會擁有grad_fn以追蹤歷史操作的梯度。

TensorFlow 2.0引入的eager提高了代碼的簡潔性，而且更容易debug。但是對於性能來說，eager執行相比Graph模式會有一定的損失。這不難理解，畢竟原生的Graph模式是先構建好靜態圖，然後才真正執行。這對於在分佈式訓練、性能優化和生產部署方面具有優勢。但是好在，TensorFlow 2.0引入了tf.function和AutoGraph來縮小eager執行和Graph模式的性能差距，其核心是將一系列的Python語法轉化爲高性能的graph操作。
即使用 @tf.function可以將eager模式轉爲 Graph的圖模式，提高計算性能

AutoGraph

AutoGraph在TensorFlow 1.x已經推出，主要是可以將一些常用的Python代碼轉化爲TensorFlow支持的Graph代碼。一個典型的例子是在TensorFlow中我們必須使用tf.while和tf.cond等複雜的算子來實現動態流程控制，但是現在我們可以使用Python原生的for和if等語法寫代碼，然後採用AutoGraph轉化爲TensorFlow所支持的代碼，如下面的例子：

def square_if_positive(x):
    if x > 0:
        x = x * x
    else:
        x = 0.0
    return x
# eager 模式
print('Eager results: %2.2f, %2.2f' % (square_if_positive(tf.constant(9.0)),
                                       square_if_positive(tf.constant(-9.0))))
# graph 模式
tf_square_if_positive = tf.autograph.to_graph(square_if_positive)
with tf.Graph().as_default():
  # The result works like a regular op: takes tensors in, returns tensors.
  # You can inspect the graph using tf.get_default_graph().as_graph_def()
    g_out1 = tf_square_if_positive(tf.constant( 9.0))
    g_out2 = tf_square_if_positive(tf.constant(-9.0))
    with tf.compat.v1.Session() as sess:
        print('Graph results: %2.2f, %2.2f\n' % (sess.run(g_out1), sess.run(g_out2)))

上面我們定義了一個square_if_positive函數，它內部使用的Python的原生的if語法，對於TensorFlow 2.0的eager執行，這是沒有問題的。然而這是TensorFlow 1.x所不支持的，但是使用AutoGraph可以將這個函數轉爲Graph函數，你可以將其看成一個常規TensorFlow op，其可以在Graph模式下運行（tf2 沒有Session，這是tf1.x的特性，想使用tf1.x的話需要調用tf.compat.v1）。大家要注意eager模式和Graph模式的差異，儘管結果是一樣的，但是Graph模式更高效。
從本質上講，AutoGraph是將Python代碼轉爲TensorFlow原生的代碼，我們可以進一步看到轉化後的代碼：

print(tf.autograph.to_code(square_if_positive))
#################################################
from __future__ import print_function
def tf__square_if_positive(x):
  try:
    with ag__.function_scope('square_if_positive'):
      do_return = False
      retval_ = None
      cond = ag__.gt(x, 0)
      def if_true():
        with ag__.function_scope('if_true'):
          x_1, = x,
          x_1 = x_1 * x_1
          return x_1
      def if_false():
        with ag__.function_scope('if_false'):
          x = 0.0
          return x
      x = ag__.if_stmt(cond, if_true, if_false)
      do_return = True
      retval_ = x
      return retval_
  except:
    ag__.rewrite_graph_construction_error(ag_source_map__)
tf__square_if_positive.autograph_info__ = {}

可以看到AutoGraph轉化的代碼定義了兩個條件函數，然後調用if_stmt op，應該就是類似tf.cond的op。
AutoGraph支持很多Python特性，比如循環：

def sum_even(items):
    s = 0
    for c in items:
        if c % 2 > 0:
            continue
        s += c
    return s
print('Eager result: %d' % sum_even(tf.constant([10,12,15,20])))
tf_sum_even = tf.autograph.to_graph(sum_even)
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
    print('Graph result: %d\n\n' % sess.run(tf_sum_even(tf.constant([10,12,15,20]))))

對於大部分Python特性AutoGraph是支持的，但是其仍然有限制，具體可以見Capabilities and Limitations。

此外，要注意的一點是，經過AutoGraph轉換的新函數是可以eager模式下執行的，但是性能卻並不會比轉換前的高，你可以對比：

x = tf.constant([10, 12, 15, 20])
print("Eager at orginal code:", timeit.timeit(lambda: sum_even(x), number=100))
print("Eager at autograph code:", timeit.timeit(lambda: tf_sum_even(x), number=100))
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
    graph_op = tf_sum_even(tf.constant([10, 12, 15, 20]))
    sess.run(graph_op)  # remove first call
    print("Graph at autograph code:", timeit.timeit(lambda: sess.run(graph_op), number=100))
##########################################
Eager at orginal code: 0.05176109499999981
Eager at autograph code: 0.11203173799999977
Graph at autograph code: 0.03418808900000059

所以，在TensorFlow 2.0，我們一般不會直接使用tf.autograph，因爲eager執行下效率沒有提升。要真正達到Graph模式下的效率，要依賴tf.function這個更強大的利器。

性能優化：tf.function

儘管eager執行更簡潔，但是Graph模式卻是性能更高，爲了減少這個性能gap，TensorFlow 2.0引入了tf.function，先給出官方對tf.function的說明：

function constructs a callable that executes a TensorFlow graph (tf.Graph) created by tracing the TensorFlow operations in func. This allows the TensorFlow runtime to apply optimizations and exploit parallelism in the computation defined by func.

簡單來說，就是tf.function可以將一個func中的TensorFlow操作構建爲一個Graph，這樣在調用時是執行這個Graph，這樣計算性能更優。比如下面的例子：

def f(x, y):
    print(x, y)
    return tf.reduce_mean(tf.multiply(x ** 2, 3) + y)
g = tf.function(f)
x = tf.constant([[2.0, 3.0]])
y = tf.constant([[3.0, -2.0]])
# `f` and `g` will return the same value, but `g` will be executed as a
# TensorFlow graph.
assert f(x, y).numpy() == g(x, y).numpy()
# tf.Tensor([[2. 3.]], shape=(1, 2), dtype=float32) tf.Tensor([[ 3. -2.]], shape=(1, 2), dtype=float32)
# Tensor("x:0", shape=(1, 2), dtype=float32) Tensor("y:0", shape=(1, 2), dtype=float32)

如上面的例子，被tf.function裝飾的函數將以Graph模式執行，可以把它想象一個封裝了Graph的TF op，直接調用它也會立即得到Tensor結果，但是其內部是高效執行的。需要注意的是，我們在內部打印Tensor時，eager執行會直接打印Tensor的值，而Graph模式打印的是Tensor句柄，其無法調用numpy方法取出值，這和TF 1.x的Graph模式是一致的。所以在調試的時候使用 eager 模式進行調試，而 Graph 模式，則更適合訓練和計算

由於tf.function裝飾的函數是Graph執行，其執行速度一般要比eager模式要快，當Graph包含很多小操作時差距更明顯，可以比較下卷積和LSTM的性能差距：

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)
@tf.function
def conv_fn(image):
  return conv_layer(image)
image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)
print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
# 單純的卷積差距不是很大
# Eager conv: 0.44013839924952197
# Function conv: 0.3700763391782858
lstm_cell = tf.keras.layers.LSTMCell(10)
@tf.function
def lstm_fn(input, state):
  return lstm_cell(input, state)
input = tf.zeros([10, 10])
state = [tf.zeros([10, 10])] * 2
# warm up
lstm_cell(input, state); lstm_fn(input, state)
print("eager lstm:", timeit.timeit(lambda: lstm_cell(input, state), number=10))
print("function lstm:", timeit.timeit(lambda: lstm_fn(input, state), number=10))
# 對於LSTM比較heavy的計算，Graph執行要快很多
# eager lstm: 0.025562446062237565
# function lstm: 0.0035498656569271647

要想靈活使用tf.function，必須深入理解它背後的機理，這裏簡單地談一下。在TF 1.x時，首先要創建靜態計算圖，然後新建Session真正執行不同的運算：

import tensorflow as tf
x = tf.placeholder(tf.float32)
y = tf.square(x)
z = tf.add(x, y)
sess = tf.Session()
z0 = sess.run([z], feed_dict={x: 2.})        # 6.0
z1 = sess.run([z], feed_dict={x: 2., y: 2.}) # 4.0

儘管上面只定義了一個graph，但是兩次不同的sess執行（運行時）其實是執行兩個不同的程序或者說subgraph：

def compute_z0(x):
  return tf.add(x, tf.square(x))
def compute_z1(x, y):
  return tf.add(x,  y)

對於tensorflow2.0而言，這裏我們將兩個不同的subgraph封裝到了兩個python函數中。更進一步地，我們可以不再需要Session，當執行這兩個函數時，直接調用對應的計算圖就可以，這就是tf.function的功效：

import tensorflow as tf
@tf.function
def compute_z1(x, y):
  return tf.add(x, y)
@tf.function
def compute_z0(x):
  return compute_z1(x, tf.square(x))
z0 = compute_z0(2.)
z1 = compute_z1(2., 2.)

多態性：可以說tf.function內部管理了一系列Graph，並控制了Graph的執行。另外一個問題時，雖然函數內部定義了一系列的操作，但是對於不同的輸入，是需要不同的計算圖。如函數的輸入Tensor的shape或者dtype不同，那麼計算圖是不同的，好在tf.function支持這種多態性（polymorphism）

# Functions are polymorphic
@tf.function
def double(a):
  print("Tracing with", a)
  return a + a
print(double(tf.constant(1)))
print(double(tf.constant(1.1)))
print(double(tf.constant([1, 2])))
# Tracing with Tensor("a:0", shape=(), dtype=int32)
# tf.Tensor(2, shape=(), dtype=int32)
# Tracing with Tensor("a:0", shape=(), dtype=float32)
# tf.Tensor(2.2, shape=(), dtype=float32)
# Tracing with Tensor("a:0", shape=(2,), dtype=int32)
# tf.Tensor([2 4], shape=(2,), dtype=int32)

注意函數內部的打印，當輸入tensor的shape或者類型發生變化，打印的東西也是相應改變。所以，它們的計算圖（靜態的）並不一樣。tf.function這種多態特性其實是背後追蹤了（tracing）不同的計算圖。具體來說，被tf.function裝飾的函數f接受一定的Tensors，並返回0到任意到Tensor，當裝飾後的函數F被執行時：

根據輸入Tensors的shape和dtypes確定一個"trace_cache_key"；
每個"trace_cache_key"映射了一個Graph，當新的"trace_cache_key"要建立時，f將構建一個新的Graph，若"trace_cache_key"已經存在，那麼直需要從緩存中查找已有的Graph即可；
將輸入Tensors喂進這個Graph，然後執行得到輸出Tensors

這種多態性是我們需要的，因爲有時候我們希望輸入不同shape或者dtype的Tensors，但是當"trace_cache_key"越來越多時，意味着你要cache了龐大的Graph，這點是要注意的。另外，tf.function提供了input_signature，這個參數採用tf.TensorSpec指定了輸入到函數的Tensor的shape和dtypes，如下面的例子：

@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def f(x):
    return tf.add(x, 1.)
print(f(tf.constant(1.0)))  # tf.Tensor(2.0, shape=(), dtype=float32)
print(f(tf.constant([1.0,]))) # tf.Tensor([2.], shape=(1,), dtype=float32)
print(f(tf.constant([1])))  # ValueError: Python inputs incompatible with input_signature

此時，輸入Tensor的dtype必須是float32，但是shape不限制，當類型不匹配時會出錯。

tf.function的另外一個參數是autograph，默認是True，意思是在構建Graph時將自動使用AutoGraph，這樣你可以在函數內部使用Python原生的條件判斷以及循環語句，因爲它們會被tf.cond和tf.while_loop轉化爲Graph代碼。注意的一點是判斷分支和循環必須依賴於Tensors纔會被轉化，當autograph爲False時，如果存在判斷分支和循環必須依賴於Tensors的情況將會出錯。如下面的例子：

def sum_even(items):
  s = 0
  for c in items:
    if c % 2 > 0:
      continue
    s += c
  return s
sum_even_autograph_on = tf.function(sum_even, autograph=True)
sum_even_autograph_off = tf.function(sum_even, autograph=False)
x = tf.constant([10, 12, 15, 20])
sum_even(x) # OK 
sum_even_autograph_on(x) # OK
sum_even_autograph_off(x) # TypeError: Tensor objects are only iterable when eager execution is enabled

很容易理解，應用tf.function之後是Graph模式，Tensors是不能被遍歷的，但是採用AutoGraph可以將其轉換爲Graph代碼，所以可以成功。大部分情況，我們還是默認開啓autograph。

重要的是tf.function可以應用到類方法中，並且可以引用tf.Variable，可以看下面的例子：

class ScalarModel(object):
  def __init__(self):
    self.v = tf.Variable(0)
  @tf.function
  def increment(self, amount):
    self.v.assign_add(amount)
model1 = ScalarModel()
model1.increment(tf.constant(3))
assert int(model1.v) == 3
model1.increment(tf.constant(4))
assert int(model1.v) == 7
model2 = ScalarModel()  # model1和model2 擁有不同變量
model2.increment(tf.constant(5))
assert int(model2.v) == 5

後面會講到，這個特性可以應用到tf.Keras的模型構建中。上面這個例子還有一點，就是可以在function中使用tf.assign這類具有副作用（改變Variable的值）的操作，這對於模型訓練比較重要。

前面說過，python原生的print函數只會在構建Graph時打印一次Tensor句柄。如果想要打印Tensor的具體值，要使用tf.print：

@tf.function
def print_element(items):
    for c in items:
      tf.print(c)
x = tf.constant([1, 5, 6, 8, 3])
print_element(x)

這裏就對tf.function做這些介紹，但是實際上其還有更多複雜的使用須知，詳情可以參考TensorFlow 2.0: Functions, not Sessions。

模型構建：tf.keras

TensorFlow 2.0全面keras化：如果你想使用高級的layers，只能選擇keras。TensorFlow 1.x存在tf.layers以及tf.contrib.slim等高級API來創建模型，但是2.0僅僅支持tf.keras.layers，不管怎麼樣，省的大家重複造輪子，也意味着模型構建的部分大家都是統一的，增加代碼的複用性（回憶一下原來的TensorFlow模型構建真是千奇百怪）。值得注意的tf.nn模塊依然存在，裏面是各種常用的nn算子，不過大部分人不會去直接用這些算子構建模型，因爲keras.layers基本上包含了常用的網絡層。當然，如果想構建新的layer，可以直接繼承tf.keras.layers.Layer：

class Linear(tf.keras.layers.Layer):
    def __init__(self, units=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units),
                             initializer='random_normal',
                             trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                             initializer='random_normal',
                             trainable=True)
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
layer = Linear(32)
print(layer.weights)  # [] the weights have not created
x = tf.ones((8, 16))
y = layer(x)  # shape [8, 32]
print(layer.weights)

這裏我們繼承了Layer來實現自定義layer。第一個要注意的點是我們定義了build方法（十分常用），其主要用於根據input_shape創建layer的Variables。注意，我們沒有在類構造函數中創建Variables(即沒有在__init__函數裏面初始化 Variables參數)，而是單獨定義了一個方法。之所以這樣做類的構造函數中並沒有傳入輸入Tensor的信息，這裏需要的是input的輸入特徵維度，所以無法創建Variables。這個build方法會在layer第一次真正執行（執行layer(input)）時纔會執行，並且只會執行一次（Layer內部有self.build這個bool屬性）。這是一種懶惰執行機制，如果熟悉Pytorch的話，PyTorch在創建layer時是需要輸入Tensor的信息，這意味着它是立即創建了Variables。

第二點是Layer本身有很多屬性和方法，這裏列出一些重要的：

add_weight方法：用於創建layer的weights（不用直接調用tf.Variale）；
add_loss方法：顧名思義，用於添加loss，增加的loss可以通過layer.losses屬性獲得，你可以在call方法中使用該方法添加你想要的loss；
add_metric方法：添加metric到layer；
losses屬性：通過add_loss方法添加loss的list集合，比如一部分layer的正則化loss可以通過這個屬性獲得；
trainable_weights屬性：可訓練的Variables列表，在模型訓練時需要這個屬性；
non_trainable_weights屬性：不可訓練的Variables列表；
weights 屬性：trainable_weights和non_trainable_weights的合集；
trainable屬性`：可變動的bool值，決定layer是否可以訓練。

Layer類是keras中最基本的類，對其有個全面的認識比較重要，具體可以看源碼。大部分情況下，我們只會複用keras已有的layers，而我們創建模型最常用的是keras.Model類，這個Model類是繼承了Layer類，但是提供了更多的API，如model.compile(), model.fit(), model.evaluate(), model.predict()等，熟悉keras的都知道這是用於模型訓練，評估和預測的方法。另外重要的一點，我們可以繼承Model類，創建包含多layers的模塊或者模型：

class ConvBlock(tf.keras.Model):
    """Convolutional Block consisting of (conv->bn->relu).
    Arguments:
      num_filters: number of filters passed to a convolutional layer.
      kernel_size: the size of convolution kernel
      weight_decay: weight decay
      dropout_rate: dropout rate.
    """
    def __init__(self, num_filters, kernel_size,
                 weight_decay=1e-4, dropout_rate=0.):
        super(ConvBlock, self).__init__()
        self.conv = tf.keras.layers.Conv2D(num_filters,
                                          kernel_size,
                                          padding="same",
                                          use_bias=False,
                                          kernel_initializer="he_normal",
                                          kernel_regularizer=tf.keras.regularizers.l2(weight_decay))
        self.bn = tf.keras.layers.BatchNormalization()
        self.dropout = tf.keras.layers.Dropout(dropout_rate)
    def call(self, x, training=True):
        output = self.conv(x)
        output = self.bn(x, training=training)
        output = tf.nn.relu(output)
        output = self.dropout(output, training=training)
        return output
model = ConvBlock(32, 3, 1e-4, 0.5)
x = tf.ones((4, 224, 224, 3))
y = model(x)
print(model.layers)

這裏我們構建了一個包含Conv2D->BatchNorm->ReLU的block，打印model.layers可以獲得其內部包含的所有layers。更進一步地，我們可以在複用這些block就像使用tf.keras.layers一樣構建更復雜的模塊：

class SimpleCNN(tf.keras.Model):
    def __init__(self, num_classes):
        super(SimpleCNN, self).__init__()
        self.block1 = ConvBlock(16, 3)
        self.block2 = ConvBlock(32, 3)
        self.block3 = ConvBlock(64, 3)
        self.global_pool = tf.keras.layers.GlobalAveragePooling2D()
        self.classifier = tf.keras.layers.Dense(num_classes)
    def call(self, x, training=True):
        output = self.block1(x, training=training)
        output = self.block2(output, training=training)
        output = self.block3(output, training=training)
        output = self.global_pool(output)
        logits = self.classifier(output)
        return logits
model = SimpleCNN(10)
print(model.layers)
x = tf.ones((4, 32, 32, 3))
y = model(x) # [4, 10]

這種使用手法和PyTorch的Module是類似的，並且Model類的大部分屬性會遞歸地收集內部layers的屬性，比如model.weights是模型內所有layers中定義的weights。

構建模型的另外方式還可以採用Keras原有方式，如採用tf.keras.Sequential：

model = tf.keras.Sequential([
# Adds a densely-connected layer with 64 units to the model:
layers.Dense(64, activation='relu', input_shape=(32,)),
# Add another:
layers.Dense(64, activation='relu'),
# Add a softmax layer with 10 output units:
layers.Dense(10, activation='softmax')])

或者採用keras的functional API：

inputs = keras.Input(shape=(784,), name='img')
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')

雖然都可以，但是我個人還是喜歡第一種那種模塊化的模型構建方法。另外，你可以對call方法應用tf.function，這樣模型執行就使用Graph模式了。

模型訓練

在開始模型訓練之前，一個重要的項是數據加載，TensorFlow 2.0的數據加載還是採用tf.data，不過在eager模式下，tf.data.Dataset這個類將成爲一個Python迭代器，我們可以直接取值：

dataset = tf.data.Dataset.range(10)
for i, elem in enumerate(dataset):
    print(elem)  # prints 0, 1, ..., 9

這裏我們只是展示了一個簡單的例子，但是足以說明tf.data在TensorFlow 2.0下的變化，tf.data其它使用技巧和TensorFlow 1.x是一致的。

另外tf.keras提供兩個重要的模塊losses和metrics用於模型訓練。對於losses，其本身就是對各種loss函數的封裝，如下面的case：

bce = tf.keras.losses.BinaryCrossentropy()
loss = bce([0., 0., 1., 1.], [1., 1., 1., 0.])
print('Loss: ', loss.numpy())  # Loss: 11.522857

而metrics模塊主要包含了常用的模型評估指標，這個模塊與TensorFlow 1.x的metrics模塊設計理念是一致的，就是metric本身是有狀態的，一般是通過創建Variable來記錄。基本用法如下：

m = tf.keras.metrics.Accuracy()
m.update_state([1, 2, 3, 4], [0, 2, 3, 4])
print('result: ', m.result().numpy())  # result: 0.75
m.update_state([0, 2, 3], [1, 2, 3])
print('result: ', m.result().numpy())  #  result: 0.714
m.reset_states()  # 重置
m.update_state([0, 2, 3], [1, 2, 3])
print('result: ', m.result().numpy())  #  result: 0.667

當你需要自定義metric時，你可以繼承tf.keras.metrics.Metric類，然後實現一些接口即可，下面這個例子展示如何計算多分類問題中TP數量：

class CatgoricalTruePositives(tf.keras.metrics.Metric):
    def __init__(self, name='categorical_true_positives', **kwargs):
      super(CatgoricalTruePositives, self).__init__(name=name, **kwargs)
      self.true_positives = self.add_weight(name='tp', initializer='zeros')
    def update_state(self, y_true, y_pred, sample_weight=None):
      y_pred = tf.argmax(y_pred)
      values = tf.equal(tf.cast(y_true, 'int32'), tf.cast(y_pred, 'int32'))
      values = tf.cast(values, 'float32')
      if sample_weight is not None:
        sample_weight = tf.cast(sample_weight, 'float32')
        values = tf.multiply(values, sample_weight)
      self.true_positives.assign_add(tf.reduce_sum(values))
    def result(self):
      return self.true_positives
    def reset_states(self):
      # The state of the metric will be reset at the start of each epoch.
      self.true_positives.assign(0.)

上面的三個接口必須都要實現，其中update_state是通過添加新數據而更新狀態，而reset_states是重置初始值，result方法是獲得當前狀態，即metric結果。注意這個metric其實是創建了一個Variable來保存TP值。你可以類比實現更復雜的metric。
對於模型訓練，我們可以通過下面一個完整實例來全面學習：

import numpy as np
import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Adding a dimension to the array -> new shape == (28, 28, 1)
train_images = train_images[..., None]
test_images = test_images[..., None]
# Getting the images in [0, 1] range.
train_images = train_images / np.float32(255)
test_images = test_images / np.float32(255)
train_labels = train_labels.astype('int64')
test_labels = test_labels.astype('int64')
# dataset
train_ds = tf.data.Dataset.from_tensor_slices(
    (train_images, train_labels)).shuffle(10000).batch(32)
test_ds = tf.data.Dataset.from_tensor_slices(
    (test_images, test_labels)).batch(32)
# Model
class MyModel(tf.keras.Sequential):
    def __init__(self):
        super(MyModel, self).__init__([
          tf.keras.layers.Conv2D(32, 3, activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Conv2D(64, 3, activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Flatten(),
          tf.keras.layers.Dense(64, activation='relu'),
          tf.keras.layers.Dense(10, activation=None)
        ])
model = MyModel()
# optimizer
initial_learning_rate = 1e-4
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=lr_schedule)
# checkpoint
checkpoint = tf.train.Checkpoint(step=tf.Variable(0), optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(checkpoint, './tf_ckpts', max_to_keep=3)
# loss function
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# metric
train_loss_metric = tf.keras.metrics.Mean(name='train_loss')
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
test_loss_metric = tf.keras.metrics.Mean(name='test_loss')
test_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')
# define a train step
@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        loss = loss_object(targets, predictions)
        loss += sum(model.losses)  # add other losses
    # compute gradients and update variables
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss_metric(loss)
    train_acc_metric(targets, predictions)
# define a test step
@tf.function
def test_step(inputs, targets):
    predictions = model(inputs, training=False)
    loss = loss_object(targets, predictions)
    test_loss_metric(loss)
    test_acc_metric(targets, predictions)
# train loop
epochs = 10
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))
    # Iterate over the batches of the dataset
    for step, (inputs, targets) in enumerate(train_ds):
        train_step(inputs, targets)
        checkpoint.step.assign_add(1)
        # log every 20 step
        if step % 20 == 0:
            manager.save() # save checkpoint
            print('Epoch: {}, Step: {}, Train Loss: {}, Train Accuracy: {}'.format(
                epoch, step, train_loss_metric.result().numpy(),
                train_acc_metric.result().numpy())
            )
            train_loss_metric.reset_states()
            train_acc_metric.reset_states()
# do test
for inputs, targets in test_ds:
    test_step(inputs, targets)
print('Test Loss: {}, Test Accuracy: {}'.format(
    test_loss_metric.result().numpy(),
    test_acc_metric.result().numpy()))

麻雀雖小，但五臟俱全，這個實例包括數據加載，模型創建，以及模型訓練和測試。特別注意的是，這裏我們將train和test的一個step通過tf.function轉爲Graph模式，可以加快訓練速度，這是一種值得推薦的方式。另外一點，上面的訓練方式採用的是custom training loops，自由度較高，另外一種訓練方式是採用keras比較常規的compile和fit訓練方式。

TensorFlow 2.0的另外一個特點是提供tf.distribute.Strategy更好地支持分佈式訓練，其接口更加簡單易用。我們最常用的分佈式策略是單機多卡同步訓練，tf.distribute.MirroredStrategy完美支持這種策略。這種策略將在每個GPU設備上創建一個模型副本（replica），模型中的參數在所有replica之間映射，稱之爲MirroredVariables，當他們執行相同更新時將在所有設備間同步。底層的通信採用all-reduce算法，all-reduce方法可以將多個設備上的Tensors聚合在每個設備上，這種通信方式比較高效，而all-reduce算法有多中實現方式，這裏默認採用NVIDIA NCCL的all-reduce方法。創建這種策略只需要簡單地定義：

mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"],
    cross_device_ops=tf.distribute.NcclAllReduce())
# 這裏將在GPU 0和1上同步訓練

當我們創建好分佈式策略後，在後續的操作中只需要加入strategy.scope即可。下面我們創建一個簡單的模型以及優化器：

with mirrored_strategy.scope():
    model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

對於dataset，我們需要調用tf.distribute.Strategy.experimental_distribute_dataset來分發數據：

with mirrored_strategy.scope():
    dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(1000).batch(
      global_batch_size)
    # 注意這裏是全局batch size
    dist_dataset = mirrored_strategy.experimental_distribute_dataset(dataset)

然後我們定義train step，並採用strategy.experimental_run_v2來執行：

@tf.function
def train_step(dist_inputs):
    def step_fn(inputs):
        features, labels = inputs
        with tf.GradientTape() as tape:
            logits = model(features)
            cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
            logits=logits, labels=labels)
            loss = tf.reduce_sum(cross_entropy) * (1.0 / global_batch_size)
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))
        return cross_entropy
    per_example_losses = mirrored_strategy.experimental_run_v2(step_fn, args=(dist_inputs,))
    mean_loss = mirrored_strategy.reduce(tf.distribute.ReduceOp.MEAN,
                    per_example_losses, axis=0)
    return mean_loss

這裏要注意的是我們要將loss除以全部batch size，只是因爲分佈式訓練時在更新梯度前會將所有replica上梯度通過all-reduce算法相加聚合到每個設備上。另外，strategy.experimental_run_v2返回是每個replica的結果，要得到最終結果，需要reduce聚合一下。
最後是執行訓練，採用循環方式即可：

with mirrored_strategy.scope():
    for inputs in dist_dataset:
        print(train_step(inputs))

要注意的是MirroredStrategy只支持單機多卡同步訓練，如果想使用多機版本，需要採用MultiWorkerMirorredStrateg。其它的分佈式訓練策略還有CentralStorageStrategy，TPUStrategy，ParameterServerStrategy。想深入瞭解的話，可以查看distribute_strategy guide以及distribute_strategy tuorial。

結語

這裏我們簡明扼要地介紹了TensorFlow 2.0的核心新特性，相信掌握這些新特性就可以快速入手TensorFlow 2.0。未來也許會有更多想象不到的黑科技。加油！TensorFlow Coders。

TensorFlow 2.0簡明入門指南

Eager執行

AutoGraph

性能優化：tf.function

模型構建：tf.keras

模型訓練

結語

【SQL進階】CASE語句的使用

npm error Cannot read properties of null (reading 'isDescendantOf')

語音信號處理流程總結

深度學習論文專欄

深度學習論文常見單詞積累

數據集加速下載

動手學深度學習學習筆記tf2.0版（6.10 雙向循環神經網絡）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結