TensorFlow計算模型-計算圖

上一節我們講到，TensorFlow中所有的計算都會被轉換爲計算圖上的節點。如果說TensorFlow的Tensor是計算圖的數據結構，那麼Flow則體現了它的計算模型。我們這裏詳細瞭解一下計算圖的使用.

計算圖的簡單示例

通過變量實現神經網絡前向傳播過程

#  coding:utf8

    import tensorflow as tf

    #聲明w1,w2兩個變量,這裏還通過seed參數設定隨機種子,保證每次運行結果一致
    w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

    #x設置爲一個佔位符
    x = tf.placeholder(tf.float32, shape=(None, 2), name='input')

    #矩陣乘法操作
    a = tf.matmul(x, w1)
    y = tf.matmul(a, w2)

    #創建一個會話
    sess = tf.Session()

    #初始化w1,w2
    sess.run(w1.initializer)
    sess.run(w2.initializer)

    #使用sess計算前向傳播輸出值
    print(sess.run(y, feed_dict={x: [[0.7, 0.9]]}))

    sess.close()

從上述程序可以看出，使用一個圖，簡單的分爲以下兩步:

創建圖結構(定義變量/計算節點)
創建會話，計算圖

計算圖的使用

TensorFlow程序一般分爲兩個階段，第一階段定義計算圖中所有的計算。第二階段爲執行計算。

在編寫程序過程中，TensorFlow會自動將定義的計算轉化爲計算圖上的節點，在TensorFlow中，系統會自動維護一個默認的計算圖，通過tf.get_default_graph函數可以獲取當前默認的計算圖。

    import tensorflow as tf
    a = tf.constant([1.0,2.0],name='a')
    b = tf.constant([2.0,3.0],name='b')
    result = a + b

    #通過a.graph可以查看張量所屬的計算圖，因爲沒有特意指定，所以這個計算圖應該是默認的計算圖
    print(a.graph is tf.get_default_graph())
    >>>True

創建新的計算圖

除了使用默認的計算圖，TensorFlow支持通過tf.Graph函數生成新的計算圖。使用tf.Graph.as_default()方法將一個計算圖設置爲默認計算圖，同時返回一個上下文管理器。這裏可以配合with語句是保證操作的資源可以正確的打開和釋放。不同的計算圖上的張量和運算不會共享。。下面代碼示意在不同計算圖上定義和使用變量：

    # coding:utf8

    import tensorflow as tf

    g1 = tf.Graph()
    with g1.as_default():
        #在g1中定義v
        v = tf.get_variable("v",shape=[1],initializer=tf.zeros_initializer())


    g2 = tf.Graph()
    with g2.as_default():
        #在g2中定義v
        v = tf.get_variable("v",shape=[1],initializer=tf.ones_initializer())


    with tf.Session(graph=g1) as sess:
        tf.global_variables_initializer().run()
        with tf.variable_scope("",reuse=True):
            #在g1中,變量v取值應該爲0,下面輸出應該爲[0.]
            print(sess.run(tf.get_variable('v')))


    with tf.Session(graph=g2) as sess:
        tf.global_variables_initializer().run()
        with tf.variable_scope("",reuse=True):
            #在g2中,變量v取值應該爲1,下面輸出應該爲[1.]
            print(sess.run(tf.get_variable('v')))


    [ 0.]  #g1初始化爲0
    [ 1.]  #g2初始化爲1

管理計算圖等資源

TensorFlow還提供了管理Tensor和計算的機制，計算圖可以通過tf.Graph.device函數來指定運行計算的設備。下面程序將加法計算放在GPU上執行。

g = tf.Graph()
    with g.device('/gpu:0'):
        result = a + b

TensorFlow可以通過集合(collection)來管理不同類別的資源。例如使用tf.add_to_collection函數可以將資源加入一個或多個集合。使用tf.get_collection獲取一個集合裏面的所有資源。這些資源可以是張量/變量或者運行Tensorflow程序所需要的資源。（在神經網絡的訓練中會大量使用集合管理技術）

集合名稱	集合內容	使用場景
tf.GraphKeys.GLOBAL_VARIABLES	所有變量	持久化Tensorflow模型
tf.GraphKeys.TRAINABLE_VARIABLES	可學習的變量(神經網絡的參數)	模型訓練/生成模型可視化內容
tf.GraphKeys.SUMMARIES	日誌生成相關的張量	Tensorflow計算可視化
tf.GraphKeys.QUEUE_RUNNERS	處理輸入的QueueRunner	輸入處理
tf.GraphKeys.MOVING_AVERAGE_VARIABLES	所有計算了滑動平均值的變量	計算變量的滑動平均值

圖相關的api函數

官方api地址點擊這裏
### Core graph data structures（class tf.Graph類的函數）

操作	description
tf.Graph.init()	創建一個空圖
tf.Graph.as_default()	設置爲默認圖,返回一個上下文管理器(配合with關鍵字使用). 使用示例: g = tf.Graph() with g.as_default(): c = tf.constant(5.0)
tf.Graph.as_graph_def(from_version=None)	返回一個序列化的GraphDef對象序列化的GraphDef可以導入(使用import_graph_def())到其他Graph中或者被C++ Session API調用
tf.Graph.finalize()	完成圖的構建，將圖設置爲只讀(調用後任何ops都加入不到graph裏)
tf.Graph.finalized	True if this graph has been finalized.
tf.Graph.control_dependencies(control_inputs)	指定一個帶有control_dependencies的上下文管理器(配合with指定ops對control_inputs的依賴) `with g.control_dependencies([a, b, c]): # dandewill only run after a, b, and chave executed. d = … e = …`
tf.Graph.device(device_name_or_fuc)	返回一個指定使用device的上下文管理器. 參數 device_name_or_fuc可爲:device name string/a device function/None with g.device(‘/gpu:0’):.. #設置程序運行在gpu上
tf.Graph.name_scope(name)	返回爲ops創建層次名稱(hierarchical names)的上下文管理器 (常用用來控制管理神經網絡的權重變量,便於迭代更新計算等操作)
tf.Graph.add_to_collection(name, value)	將value以name放置到collection中(利用collection機制管理變量等)
tf.Graph.get_collection(name, scope=None)	從collection返回name的元素列表
tf.Graph.as_graph_element(obj, allow_tensor=True, allow_operation=True)	Returns the object referred to by obj, as an Operation or Tensor.
tf.Graph.get_operation_by_name(name)	Returns the Operation with the given name
tf.Graph.get_tensor_by_name(name)	Returns the Tensor with the given name.
tf.Graph.get_operations()	返回圖中ops列表
tf.Graph.get_default_device()	Returns the default device.
tf.Graph.seed
tf.Graph.unique_name(name)	Return a unique Operation name for “name”.
tf.Graph.version	Returns a version number that increases as ops are added to the graph.
tf.Graph.create_op(op_type, inputs, dtypes, input_types=None, name=None, attrs=None, op_def=None, compute_shapes=True)	Creates an Operation in this graph. 這是一個low-level的創建ops的接口,大部分程序使用Python op constructors代替此函數 such as tf.constant(), 在默認圖上添加一個ops.
tf.Graph.gradient_override_map(op_type_map)	測試中:返回一個帶圖梯度下降函數的上下文管理器

Tensorflow數據模型-張量

Tensor是TensorFlow管理數據的形式，從功能的角度上來看，Tensor可以簡單的理解爲多維數數組，其中零階Tensor表示爲標量(Scalar)，即一個數。但Tensor在TensorFlow中實現並不是直接採用數組的形式，而是對TensorFlow中運算結果的引用。Tensor保存是對如何得到數字的計算過程.
Tensor的用途分爲兩類：一是對中間計算結果的引用，這樣方便獲取中間計算結果同時提高了代碼的閱讀性。二是可以用來獲得計算結果，這需要配合session.
以下示例程序:

    # coding:utf8

    import tensorflow as tf

    a = tf.constant([1.0,2.0],name='a')
    b = tf.constant([2.0,3.0],name='b')

    result = tf.add(a,b,name='add')

    print(result)

    #輸出
    Tensor("add:0", shape=(2,), dtype=float32)
    #這是一個張量結構,需要配合session使用才能計算出結果

張量結構

以上述程序爲例，一個Tensor的結構爲

Tensor("add:0", shape=(2,), dtype=float32)

這其中主要包含了三個屬性:name/shape/type(標識/維度/類型).

name屬性

name是一個Tensor的唯一標識符，同時name也給出了該Tensor是如何計算出來的。計算圖上的node和計算是相對應的。
計算的結果保存在Tensor中，Tensor的name屬性可以通過”node:src_output”形式給出。

node爲節點的名稱-
src_output表示Tensor來自當前節點的第幾個輸出。

例如上述程序的

Tensor("add:0", shape=(2,), dtype=float32)
#name爲"add:0"即 表示node的ops爲add 且是第0個輸出.

shape屬性

shape屬性描述了一個Tensor的維度信息，維度是Tensor一個極其重要的屬性，後面學習過程會有大量操作維度的計算。
在程序中:

Tensor("add:0", shape=(2,), dtype=float32)
#"shape=(2,)"說明result是一個一維向量，向量的長度爲2.

type屬性

每一個Tensor都有一個唯一的類型，TensorFlow會對所有參與計算的Tensor進行類型檢查，當發現類型不匹配時會報錯。
例如：

import tensorflow as tf

    a = tf.constant([1,2],name='a')   #檢測爲int32
    b = tf.constant([2.0,3.0],name='b') #檢測爲float32
    result = a + b

    # result = a + b
    # ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("b:0", shape=(2,), dtype=float32)'

這裏對常量a默認爲int32類型，在與float32類型的b相加會出現類型不匹配，故報錯.

我們可以指定constant的類型，例如將a改爲float32類型，例如下面程序就不會報錯了

import tensorflow as tf

    a = tf.constant([1,2],name='a',dtype=tf.float32) #dtype='float32'也可以，爲了兼容性，最好還是tf.float32
    b = tf.constant([2.0,3.0],name='b')

    result = a + b

TensorFlow中types相關屬性

在TensorFlow中有14種不同的類型,見下表

類型	描述符
實數	tf.float32 : 32-bit single-precision floating-point. tf.float64: 64-bit double-precision floating-point. tf.bfloat16: 16-bit truncated floating-point.
整數	tf.int8: 8-bit signed integer. tf.uint8: 8-bit unsigned integer tf.int32: 32-bit signed integer tf.int64: 64-bit signed integer. tf.qint8: Quantized 8-bit signed integer. tf.quint8: Quantized 8-bit unsigned integer tf.qint32: Quantized 32-bit signed integer
布爾	tf.bool: Boolean.
複數	tf.complex64: 64-bit single-precision complex.

type的api（class tf.DType）

操作	description
tf.DType.is_compatible_with(other)	如果other類型可以轉爲爲此類型，返回True
tf.DType.name	Returns the string name for this DType.
tf.DType.base_dtype	Returns a non-reference DType based on this DType.
tf.DType.is_ref_dtype	Returns True if this DType represents a reference type.
tf.DType.as_ref	Returns a reference DType based on this DType.
tf.DType.is_integer	Returns whether this is a (non-quantized) integer type.
tf.DType.is_quantized	Returns whether this is a quantized data type.
tf.DType.as_numpy_dtype	Returns a numpy.dtype based on this DType
tf.DType.as_datatype_enum	Returns a types_pb2.DataType enum value based on this DType.
tf.DType.init(type_enum)	Creates a new DataType.
tf.DType.max	Returns the maximum representable value in this data type.
tf.DType.min	Returns the minimum representable value in this data type.
tf.as_dtype(type_value)	Converts the given type_value to a DType.

張量相關api

class tf.Tensor
Represents a value produced by an Operation.

A Tensor is a symbolic handle to one of the outputs of an Operation. It does not hold the values of that operation’s output, but instead provides a means of computing those values in a TensorFlow Session.

This class has two primary purposes:

A Tensor can be passed as an input to another Operation. This builds a dataflow connection between operations, which enables TensorFlow to execute an entire Graph that represents a large, multi-step computation.
After the graph has been launched in a session, the value of the Tensor can be computed by passing it to Session.run(). t.eval() is a shortcut for calling tf.get_default_session().run(t).

操作	description
tf.Tensor.dtype	Tensor的DType屬性
tf.Tensor.name	Tensor的Name屬性
tf.Tensor.value_index	Tensor在對應的ops(即創建tensor的ops)輸出序號
tf.Tensor.graph	包含該tensor的計算圖
tf.Tensor.op	創建該Tensor的ops
tf.Tensor.consumers()	返回使用該Tensor的ops列表
tf.Tensor.eval(feed_dict=None, session=None)	在session中計算Tensor值該函數要再session中使用,即with sess.as_default() 或者eval(session=sess)指定sess對象
tf.Tensor.get_shape()	返回類型爲TensorShape的Tensor的shape
tf.Tensor.set_shape(shape)	更新Tensor的Shape
tf.Tensor.init(op, value_index, dtype)	Creates a new Tensor
tf.Tensor.device	設置計算該Tensor的設備

會話(Session)

會話(Session)擁有並管理TensorFlow運行時的所有資源，同時每個會話有自己的資源，例如 tf.Variable, tf.QueueBase, and tf.ReaderBase.當這些資源使用完畢後，及時的釋放這些資源是很重要的,此時可以使用Session.close釋放會話資源.，當所有計算完成後需要關閉會話來幫助系統回收資源，避免資源泄漏等問題。

會話模式

TensorFlow使用會話模式一般分爲兩種，明確調用會話生成函數和通過上下文管理器管理會話.

1.會話模式–明確調用會話生成函數和關閉會話函數

使用這種模式，當所有計算完成後，需要使用session.close函數關閉會話釋放資源，當程序出現異常，會話得不到正常關閉。使用示例如下：

    #創建一個會話
    sess = tf.Session()

    #使用這個會話可以得到張量的結果,例如sess.run(result)
    sess.run(...)

    #關閉會話
    sess.close()

2.會話模式–通過Python上下文管理器

在Python中，我們常使用上下文管理器來操作文件，例如

    with open('...') as fp:

這樣做的好處，是利用上下文管理器來幫助我們簡化操作，保證資源的有效利用和釋放.同樣的我們也可以使用with來操作Session.

    #創建一個會話,通過上下文管理器管理會話
    with tf.Session() as sess:
        #do what you want
        sess.run(...)

    #用完了就不用管了

默認會話

TensorFlow在管理計算圖時會自動生成一個默認的計算圖，會話也有類似的機制，但需要手動指定。當默認的會話被指定之後可以通過tf.Tensor.eval函數來計算一個張量的取值.例如

sess = tf.Session()
    with sess.as_default():
        print(result.eval()) #計算張量的結果

或者代碼這樣寫

    sess = tf.Session()

    #下面兩個功能一樣
    print(sess.run(result))
    print(result.eval(session=sess))

在交互式環境中，通過設置默認會話的方式獲取張量的結果更加容易.TensorFlow提供了一種在交互式環境下直接構造默認會話的函數，即tf.InteractiveSession,用法如下

    sess = tf.InteractiveSession()
    print(result.eval())  #通過tf.InteractiveSession可以省去註冊默認會話的操作
    sess.close()

tf.InteractiveSession相關api（class tf.InteractiveSession）

A TensorFlow Session for use in interactive contexts, such as a shell.
The only difference with a regular Session is that an InteractiveSession installs itself as the default session on construction. The methods tf.Tensor.eval and tf.Operation.run will use that session to run ops.

操作	description
graph	The graph that was launched in this session.
graph_def	A serializable version of the underlying TensorFlow graph
init(target=”,graph=None,config=None)	Creates a new interactive TensorFlow session
as_default()	設置爲默認Session,並返回這個上下文管理器.
close()	Closes an InteractiveSession.
make_callable(fetches,feed_list=None)	Returns a Python callable that runs a particular step.
partial_run(handle,fetches, feed_dict=None)	Continues the execution with more feeds and fetches.
partial_run_setup(fetches,feeds=None)	Sets up a graph with feeds and fetches for partial run.
run(fetches,feed_dict=None,options=None,run_metadata=None)	執行一個ops並獲取一個Tensor的值

配置會話屬性

無論是用哪種方法產生的會話，都可以通過ConfigProto Protocol Buffer來配置需要生成的會話.方法如下:

   config = tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)
   sess1 = tf.InteractiveSession(config=config)
   sess2 = tf.Session(config=config)

通過ConfigProto可以配置類似並行線程數/GPU分配策略/運行超算時間等參數.
常用的兩個參數:

第一個參數:allow_soft_placement

這是一個布爾型參數,當這個值爲True，以下任意一個條件成立，GPU上的運算可以放到CPU上:
- 1.運行無法在GPU上執行
- 2.沒有GPU資源(例如指定在第三GPU上運行，但是隻有一個GPU)
- 3.運行輸入包含對CPU結果的引用
這個參數默認爲False，爲了提供代碼的可移植性，設置參數爲True，可以將在GPU上不支持的運算調整到CPU上，而不是報錯。

第二個參數:log_device_placement

這是一個布爾型參數，當設置爲True時日誌中將會記錄每個節點被安排在哪個設備上以方便調試.

會話相關api（tf.Session）

A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.

操作	description
graph	The graph that was launched in this session.
graph_def	A serializable version of the underlying TensorFlow graph.
sess_str
init(target=”,graph=None, config=None)	Creates a new TensorFlow session.
as_default()	Returns a context manager that makes this object the default session.
close()	Closes this session.
make_callable( fetches, feed_list=None)	Returns a Python callable that runs a particular step.
partial_run(handle,fetches,feed_dict=None)	Continues the execution with more feeds and fetches.
partial_run_setup( fetches,feeds=None)	Sets up a graph with feeds and fetches for partial run.
reset(target, containers=None,config=None)	Resets resource containers on target, and close all connected sessions.
run(fetches, feed_dict=None,options=None,run_metadata=None)	Runs operations and evaluates tensors in fetches.

TensorFlow變量

在TensorFlow中變量(tf.Variable)的作用可用保存和模型中參數,創建Variable需要傳入一個初始化值，TensorFlow中變量初始值可以設置爲隨機數、常數或者是通過其他變量初始值計算得到。初始化時這需要指定Variable的type和shape(初始化過後Tensor的type和shape不可變，Value可以通過assign函數改變).如果需要動態的改變Variable的shape，在聲明時指定validate_shape=False.）

下面代碼給出了一種TensorFlow變量初始化.

import tensorflow as tf

# Create a variable.
w = tf.Variable(<initial-value>, name=<optional-name>)

#for example
weights = tf.Variable(tf.random_normal([2,3],stddev=2))
#使用tf.random_normal([2,3],stddev=2)產生一個2x3矩陣，矩陣中元素均爲0，標準差爲2的隨機數。tf.random_normal函數可以通過參數mean來指定平均值，在沒指定時默認爲0.

# Use the variable in the graph like any Tensor.
y = tf.matmul(w, ...another variable or tensor...)

# The overloaded operators are available too.
z = tf.sigmoid(w + y)

# Assign a new value to the variable with `assign()` or a related method.
w.assign(w + 1.0)
w.assign_add(1.0)

在操作圖時，應明確的初始化所有變量.可以通過初始化ops完成變量的初始化.示意如下:

# Launch the graph in a session.
with tf.Session() as sess:
    # Run the variable initializer.
    sess.run(w.initializer)
    # ...you now can run ops that use the value of 'w'...



    #通過其他變量來初始化
    w2 = tf.Variable(weights.initialized_value())
    w3 = tf.Variable(weights.initialized_value()*2.0)
    #w2初始值設置爲和weights變量相同,w3是weights的兩倍

通常大大多數初始化操作是使用global_variables_initializer()函數添加一個初始化ops，我們先運行初始化ops後再執行其他計算,global_variables_initializer()用法如下：

# Add an Op to initialize global variables.
init_op = tf.global_variables_initializer()

# Launch the graph in a session.
with tf.Session() as sess:
    # Run the Op that initializes global variables.
    sess.run(init_op)
    # ...you can now run any Op that uses variable values...

所有的變量在創建時會自動收集到Collections，通常會被收集到GraphKeys.GLOBAL_VARIABLES中。使用global_variables()可以返回這個Collections的上下文。

在構建機器學習模型時，我們可以很方便的區別在訓練模型不變Variable和其他Variable.例如用於記錄訓練次數的全局變量。爲了簡化操作，變量初始化時可以設置 trainable= parameter屬性，如果設置爲True，新的變量會添加到GraphKeys.TRAINABLE_VARIABLES,我們也可以使用trainable_variables()函數獲取此Collections.The various Optimizer classes可以利用此Collections優化參數.

TensorFlow隨機數生成函數

TensorFlow的Variable可以通過隨機函數初始化，下面是TensorFlow中常用的隨機函數:

函數名稱	隨機數分佈	主要參數
tf.random_normal	正態分佈	平均值/標準差/取值類型
tf.truncated_normal	正態分佈，但如果隨機出來的值偏離平均值超過2個標準差，這個數會被重新隨機	平均值/標準差/取值類型
tf.random_uniform	平均分佈	最小/最大取值/取值類型
tf.random_gamma	Gamma分佈	形狀參數alpha/尺度參數beta/取值類型

TensorFlow也支持通過常數來初始化一個變量。下表是TensorFlow中常用的常量聲明方法.

函數名稱	功能	樣例
tf.zeros	全0數組	tf.zeros([2,3],int32)->[[0,0,0],[0,0,0]]
tf.ones	全1數組	tf.zeros([2,3],int32)->[[1,1,1],[1,1,1]]
tf.fill	產生一個全部爲給定數字的數組	tf.zeros([2,3],9)->[[9,9,9],[9,9,9]]
tf.constant	產生一個給定值常量	tf.constant([1,2,3])->[1,2,3]

變量管理

TensorFlow提供了通過變量名稱來創建或者獲取一個變量的機制，通過這個機制，在不同的函數中可以直接通過變量的名字來使用變量，而不需要將變量通過參數的形式傳遞。
TensorFlow通過變量名稱獲取變量的機制主要通過tf.get_variable和tf.variable_scope函數實現.

創建Variable可通過Variable()也可以使用get_variable()。
以下代碼是通過兩個函數創建同一個變量的實例:

v =tf.get_variable("v",shape=[1],initializer=tf.constant_initializer(1.0))
v = tf.Variable(tf.constant(1.0,shape=[1],name='v')

Variable的api(tf.Variable)

操作	description
device	The device of this variable.
dtype	The DType of this variable.
graph	he Graph of this variable.
initial_value	Returns the Tensor used as the initial value for the variable.
initializer	The initializer operation for this variable.
name	The name of this variable.
op	The Operation of this variable.
shape	The TensorShape of this variable.
init( initial_value=None, trainable=True, collections=None, validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None, expected_shape=None, import_scope=None )	使用initial_value創建一個新的Variable 如果trainable爲True,則Variable會自動添加到GraphKeys.TRAINABLE_VARIABLES(可訓練)*的Collecion collections:新變量會添加到該collections,默認會添加到GraphKeys.TRAINABLE_VARIABLES validate_shape:如果爲False,允許變量以None指定shape caching_device:the Variable should be cached for reading. Defaults to the Variable’s device. name:Defaults to ‘Variable’ and gets uniquified automatically variable_def:VariableDef protocol buffer. dtype:If set, initial_value will be converted to the given type expected_shape:A TensorShape. If set, initial_value is expected to have this shape. import_scope:Name scope to add to the Variable. Only used when initializing from protocol buffer.
abs(a,*args)	絕對值
add_(a,*args)	x+y加(每個元素相加)
and(a,*args)	返回and操作結果(每個元素)
div(a,*args)	除
floordiv(a,*args)	Divides x / y elementwise
ge(a,*args)	返回x>=y的bool型矩陣
getitem(var,slice_spec)	Creates a slice helper object given a variable.
gt(a,*args)	返回x>y的bool型矩陣
invert(a,*args)	返回Not操作的bool矩陣
le(a,*args)	返回x<=y的bool型矩陣
lt(a,*args)	返回x
matmul(a,*args)	x與y的矩陣乘法
mod(a,*args)	取模
mul(a,*args)	Dispatches cwise mul for “DenseDense” and “DenseSparse”.
neg(a,*args)	去相反值
pow(a,*args)	computes xy
sub(a,*args)	x-y(每個元素)
xor(a,*args)	x ^ y = (x
assign(value,use_locking=False)	爲Variable分配一個新值
assign_add(value,use_locking=False)	Adds a value to this variable.
assign_sub(value,use_locking=False)	Subtracts a value from this variable.
count_up_to(limit)	increments this variable until it reaches limit.
eval(session=None)	在session中計算Variable的Value
from_proto(variable_def, import_scope=None)	Returns a Variable object created from variable_def.
get_shape()	Alias of Variable.shape.
initialized_value()	初始化Variable
load(value, session=None)	Load new value into this variable Writes new value to variable’s memory. Doesn’t add ops to the graph
read_value()	返回Variable的Value
scatter_sub(sparse_delta,use_locking=False)	Subtracts IndexedSlices from this variable.
set_shape(shape)	Overrides the shape for this variable.
to_proto(export_scope=None)	Converts a Variable to a VariableDef protocol buffer.
value()	Returns the last snapshot of this variable

深度神經網絡

TensorFlow神經網絡介紹

在這裏，我們結合神經網絡的功能進一步的介紹如何通過TensorFlow來實現神經網絡。首先我們使用TensorFlow遊樂場(TensorFlow工具)簡單瞭解實現神經網絡的功能和計算流程。再使用TensorFlow實現神經網絡的FP(前向傳播)和BP(反向傳播)算法.

TensorFlow遊樂場

TensorFlow遊樂場(http://playground.tensorflow.org)是一個Web應用，可以訓練簡單的神經網絡並實現可視化訓練過程的工具.

使用神經網絡解決分類問題主要分爲一下4個步驟:

1.提取問題中實體的特徵向量作爲數據網絡的輸入
2.定義神經網絡的結構，並定義如何從神經網絡的輸入得到輸出
3.通過訓練數據調整神經網絡的參數取值
4.使用訓練好的模型來預測未知的數據

前向傳播算法介紹

不同的神經網絡結構前向傳播的方式也不一樣，這裏介紹最簡單的全連接網絡結構的前向傳播算法，之所以稱爲全連接神經網絡是因爲相鄰兩層之間任意兩個節點都有鏈接.

其中第一部分是神經網絡的輸入:從實體中提取的特徵向量。圖示爲x1和x2
第二部分是神經網絡的連接結構:節點a11/a12/a13和連接權值W矩陣，其中w的上標表明瞭神經網絡的層數，下標表明瞭連接節點編號，比如w11即表示連接x1到a11的權值.(連接元素的具體位置取決與上標)

整個神經網絡前向傳播的過程

前向傳播算法可以表示爲矩陣乘法，將輸入x1，x2，組織成一個1x2的矩陣X=[x1,x2]，而W組織成一個2x3的矩陣(矩陣行數爲輸入個數，矩陣列數爲當前層節點個數):

這樣前向算法用矩陣方式表達出來了，在TensorFlow中矩陣乘法是很容易實現的。

     a = tf.matmul(x,w1)
     y = tf.matmul(a,w2)  #matmul實現矩陣乘法

通過變量實現神經網絡前向傳播過程

代碼如下


    #  coding:utf8

    import tensorflow as tf

    #聲明w1,w2兩個變量,這裏還通過seed參數設定隨機種子,保證每次運行結果一致
    w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

    #暫時將輸入特徵向量設置爲一個變量
    x = tf.constant([[0.7, 0.9]])  #注意這裏聲明的是一個1x2的矩陣

    a = tf.matmul(x, w1)
    y = tf.matmul(a, w2)

    sess = tf.Session()

    sess.run(w1.initializer)
    sess.run(w2.initializer)

    #使用tf.initialize_all_variables()可以初始化多個變量
    #sess.run(tf.initialize_all_variables())

    print(sess.run(y))

    sess.close()

輸出爲

[[ 3.95757794]]

需要注意的地方有:

在聲明好變量時，需要使用session初始化變量(run(w.initializer)或者使用run(tf.initialize_all_variables())初始化所有變量)
注意在聲明輸入x的時候,要初始化一個矩陣常量的方法
所有變量都會被自動的加入Grpah.VARIABLES集合。通過tf.variables函數可以得到當前計算圖所有的變量

通過TensorFlow訓練神經網絡模型

在神經網絡中，常用的方法是BP算法，下圖是BP算法執行的流程圖

BP算法是一個迭代的過程，再每次迭代過程開始，取一小部分訓練數據叫做一個batch.依據前向傳播的輸出值與標籤值的差值做BP優化。
這裏需要注意，上一節代碼我們聲明輸入用的是x=tf.constant([[0.7,0.9]])，一般神經網絡訓練過程會需要多次迭代，每次迭代中選取的數據不能靠變量來表示，這裏TensorFlow提供了placeholder機制用於輸入數據，placeholder相當於定義一個位置，這個位置中的數據在程序運行時再指定。placeholder定義時，這個位置的數據類型需要指定而且不能改變。

使用placeholder

    #  coding:utf8

    import tensorflow as tf

    #聲明w1,w2兩個變量,這裏還通過seed參數設定隨機種子,保證每次運行結果一致
    w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

    #暫時將輸入特徵向量設置爲一個變量
    x = tf.placeholder(tf.float32, shape=(1, 2), name='input')

    a = tf.matmul(x, w1)
    y = tf.matmul(a, w2)

    sess = tf.Session()

    sess.run(tf.initialize_all_variables())
    print(sess.run(y, feed_dict={x: [[0.7, 0.9]]}))
    sess.close()

輸出

[[ 3.95757794]]

可以改變輸入矩陣，得到n個樣例的前向傳播結果.例如:將輸入改爲3組數據

    #  coding:utf8

    import tensorflow as tf

    #聲明w1,w2兩個變量,這裏還通過seed參數設定隨機種子,保證每次運行結果一致
    w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

    #暫時將輸入特徵向量設置爲一個變量
    x = tf.placeholder(tf.float32, shape=(3, 2), name='input')


    a = tf.matmul(x, w1)
    y = tf.matmul(a, w2)

    sess = tf.Session()

    sess.run(tf.initialize_all_variables())

    print(sess.run(y, feed_dict={x: [[0.7, 0.9], [0.1, 0.4], [0.5, 0.8]]}))

    sess.close()

輸出

    [[ 3.95757794]
     [ 1.15376532]
     [ 3.16749239]]

再得到batch的前向傳播結果後，需要定義損失函數刻畫輸出與標籤值的差距，再通過BP調整網絡參數。

    #定義損失函數
    cross_entropy = -tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-10,1.0)))

    #定義學習率
    learning_rate = 0.001

    #定義BP算法優化神經網絡參數
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

cross_entropy定義了輸出值和標籤值的交叉熵，這是分類問題的一個常用的損失函數.
train_step定義了BP算法的優化方法，目前TensorFlow支持7種不同的優化器，常用的三種:tf.train.GradientDescentOptimizer、tf.train.AdamOptimizer和tf.train.MomentumOptimizer。再定義BP算法後，通過運行sess.run(train_step)可以對所有的GraphKeys.TRAINABLE_VARIABLES集合中的變量進行優化.

完整的神經網絡樣例程序

訓練數據網絡過程可以分爲3個步驟:

1.定義神經網絡的結構和前向傳播的輸出結果
2.定義損失函數和選擇BP優化算法
3.生成會話(tf.Session)並且在訓練數據上反覆運行反向BP優化算法
示例代碼:

    #  coding:utf8

    import tensorflow as tf

    #使用NumPy工具包生成模擬數據集
    from numpy.random import  RandomState

    #定義訓練數據batch大小
    batch_size = 8

    #聲明w1,w2兩個變量,這裏還通過seed參數設定隨機種子,保證每次運行結果一致
    w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

    #在shape的一個維度上使用None,方便使用不大的batch大小,訓練時使用小的batch,測試時使用全部的數據(內存不溢出的前提下)
    x = tf.placeholder(tf.float32, shape=(None, 2), name='x-input')
    y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y-input')


    #定義數據網絡前向傳播過程
    a = tf.matmul(x, w1)
    y = tf.matmul(a, w2)


    #定義損失函數和BP算法
    cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))

    # 定義學習率
    learning_rate = 0.001

    # 定義BP算法優化神經網絡參數
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)


    #通過隨機數生成一個數據集
    rdm = RandomState(1)
    dataset_size = 128
    X = rdm.rand(dataset_size,2)

    #定義規則來給出樣本標籤,在這裏所有x1+x2<1的樣例都被認爲是正樣本, 0代表負樣本 1代表正樣本
    Y = [[int(x1+x2<1)] for (x1,x2) in X]

    #創建會話運行程序
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())
        print(sess.run(w1))
        print(sess.run(w2))

        STEPS = 5000
        for i in range(STEPS):
            #每次選取batch_size個樣本訓練
            start = (i*batch_size) % dataset_size
            end = min(start+batch_size,dataset_size)

            sess.run(train_step,feed_dict={x:X[start:end],y_:Y[start:end]})
            if i%1000 ==0:
                total_corss_entropy = sess.run(cross_entropy,feed_dict={x:X,y_:Y})
                print("After %d training setp(s),cross entropy on all data is %g"%(i,total_corss_entropy))

        print(sess.run(w1))
        print(sess.run(w2))

輸出

    #訓練之前的網絡參數
    [[-0.81131822  1.48459876  0.06532937]
     [-2.4427042   0.09924842  0.59122437]]
    [[-0.81131822]
     [ 1.48459876]
     [ 0.06532937]]

    #交叉熵越小，說明輸出值與標籤值越接近 
    After 0 training setp(s),cross entropy on all data is 0.0674925
    After 1000 training setp(s),cross entropy on all data is 0.0163385
    After 2000 training setp(s),cross entropy on all data is 0.00907547
    After 3000 training setp(s),cross entropy on all data is 0.00714436
    After 4000 training setp(s),cross entropy on all data is 0.00578471

    #訓練後的參數
    [[-1.9618274   2.58235407  1.68203783]
     [-3.46817183  1.06982327  2.11789012]]
    [[-1.82471502]
     [ 2.68546653]
     [ 1.41819513]]

深層神經網絡

Wiki上對深度學習的定義爲“一類通過多層非線性變換對高複雜性數據建模算法的合集”。深度學習有兩個非常重要的特性—多層和非線性。

去線性化

因爲線性模型只能解決線性可分的問題，針對較多的線性不可能問題，需要對模型去線性化。這裏引入了激活函數,激活函數可以實現去線性化。普通的神經元的輸出通過一個非線性函數，整個神經網絡的模型由線性轉爲非線性了.

常用的激活函數有:

針對上面講的神經網絡，這裏我們加入偏置項和激活函數的神經網絡結構如下:

新的神經網絡模型前向傳播算法的計算辦法爲:

多層

多層神經網絡有組合特徵提取的功能，這個特性對解決不易提取特徵向量的問題有很大幫助.這也是深度學習在多種問題上突破的原因.

損失函數

神經網絡模型的效果以及優化目標是通過損失函數(loss function)來定義的.

經典損失函數

分類問題和迴歸問題是監督學習的兩大種類。在分類問題上，通過神經網絡解決分類問題常用的方法是設置n個輸出節點，n爲類別的個數。這時候需要判斷輸出指標，該如何確定一個輸出向量和期望的向量有多接近。這裏我們使用了交叉熵損失函數。

交叉熵

交叉熵(cross entropy)是分類問題常用的評判方法之一.

熵熵的本質是香農信息量的期望。

分類問題-交叉熵

交叉熵刻畫的是通過兩個概率分佈的距離，即通過概率分佈q表達概率分佈p的困難程度

給出一個具體的樣例直觀的說明交叉熵可以判斷預測與標籤值之間的距離:

TensorFlow中的交叉熵實現

我們實現的交叉熵代碼如下:

cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
其中y_代表標籤值，y代表預測值。

先說tf.clip_by_value函數，該函數可以將一個張量的值限制在一個範圍內

    v = tf.constant([[1.0,2.0,3.0],[4.0,5.0,6.0]])
    print tf.clip_by_value(v,2.5,4.5).eval()
    #v中小於2.5的轉換爲2.5 大約4.5的轉換爲4.5

    tf.clip_by_value(y,1e-10,1.0)
    #保證下一步的log值不會錯誤

tf.log 即完成對張量中所有元素的依次求對數功能

    v = tf.constant([1.0,2.0,3.0])
    print tf.log(v).eval()
    #輸出[ 0.        ,  0.69314718,  1.09861231]

    tf.log(tf.clip_by_value(y, 1e-10, 1.0))
    #對輸出值y取對數

乘法在實現交叉熵代碼中直接將兩個矩陣通過*操作，代表是元素之間相乘(矩陣乘法使用的是tf.matmul函數)

    v1 = tf.constant([[1.0,2.0],[3.0,4.0]])
    v2 = tf.constant([[5.0,6.0],[7.0,8.0]])
    print (v1*v2).eval()
    #輸出[[  5.  12.]  [ 21.  32.]]

    print tf.matmul(v1,v2).eval()
    #輸出[[ 19.  22.] [ 43.  50.]]

    y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))
    #完成了對於每一個樣例中的每一個類別交叉熵p(x)logq(x)的計算.
    #得到一個n × m的矩陣,n爲一個batch數量，m爲分類類別的數量。

取平均值根據交叉熵公式，應該將每行中m個結果相加得到所有樣例的交叉熵，再對n行取平均得到一個batch的平均交叉熵.因爲分類問題的類別數量不變，可以直接對整個矩陣平均.
```
    v = tf.constant([[1.0,2.0,3.0],[4.0,5.0,6.0]])
    print tf.reduce_mean(v).eval()
    #平均輸出爲3.5

    -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
```

因爲交叉熵一般會與Softmax迴歸一起使用,所以TensorFlow對這兩個功能統一封裝，並提供
tf.nn.softmax_cross_entropy_with_logits函數.使用下面程序實現softmax迴歸後的交叉熵損失函數:

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y,y_)

在只有一個正確答案的分類問題中，TensorFlow提供了tf.nn.sparse_softmax_cross_entropy_with_logits()函數進一步加速計算過程。

迴歸問題-均方誤差(MSE,mean squared error)

迴歸問題解決的是對具體數值的預測，需要預測的不是一個事先定義好的類別，而是一個任意實數。解決迴歸問題的神經網絡一般只有一個輸出節點，這個節點的輸出值就是預測值.

使用TensorFlow代碼表示如下:

mse = tf.reduce_mean(tf.square(y_-y))

自定義損失函數

TensorFlow支持自定義損失函數。例如

loss = tf.reduce_sum(tf.select(tf.greater(v1,v2),(v1-v2)*a,(v2-v1)*b))

在此段代碼中用了兩個函數

比較函數 tf.greater(v1,v2)
tf.greater(v1,v2)的輸入是兩個張量，函數會比較兩個張量每一個元素的大小，返回操作結果

選擇條件函數 tf.select(select不可用，暫時不知道原因)
tf.select有三個參數，第一個是選擇條件的根據(類似?:操作符)，如果爲True則選中第二個參數.否則選中第三個參數

#  coding:utf8
import tensorflow as tf

v1 = tf.constant([1.0, 2.0, 3.0, 4.0])
v2 = tf.constant([4.0, 3.0, 2.0, 1.0])

with tf.Session() as sess:
    print(sess.run(tf.greater(v1, v2)))
    print(sess.run(tf.where(tf.greater(v1, v2), v1, v2))) 
    #select不可用，使用where代替

輸出:
[False False  True  True]
[4.0 3.0 3.0 4.0]

使用自定義函數的完整歷程代碼:

    #  coding:utf8

    import tensorflow as tf

    #使用NumPy工具包生成模擬數據集
    from numpy.random import RandomState

    #定義訓練數據batch大小
    batch_size = 8


    #輸出一般只有一個輸出節點
    x = tf.placeholder(tf.float32, shape=(None, 2), name='x-input')
    y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y-input')

    #定義一個單層神經網絡前向傳播過程,這裏就是簡單的加權和
    w1 = tf.Variable(tf.random_normal([2, 1], stddev=1, seed=1))
    y = tf.matmul(x, w1)


    #定義預測成本
    loss_less = 10
    loss_more = 1
    loss = tf.reduce_sum(tf.where(tf.greater(y,y_),(y-y_)*loss_more,(y_-y)*loss_less))

    # 定義學習率
    learning_rate = 0.001

    # 定義BP算法優化神經網絡參數
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)

    #通過隨機數生成一個數據集
    rdm = RandomState(1)
    dataset_size = 128
    X = rdm.rand(dataset_size,2)

    #設置迴歸的正確值爲兩個輸入的和加上一個噪聲值。
    Y = [[x1+x2+rdm.rand()/10.0-0.05] for (x1,x2) in X]

    #創建會話運行程序
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())

        STEPS = 5000
        for i in range(STEPS):
            #每次選取batch_size個樣本訓練
            start = (i*batch_size) % dataset_size
            end = min(start+batch_size,dataset_size)
            sess.run(train_step,feed_dict={x:X[start:end], y_:Y[start:end]})
        print(sess.run(w1))


輸出

    [[ 1.01934707]
    [ 1.04280913]]

神經網絡優化算法

本節更加具體的介紹如何通過BP算法和梯度下降法調整神經網絡的參數。梯度下降法主要用於優化單個參數的取值，而BP算法給出了一個高效的方式在所有參數上使用梯度下降法，從而使神經網絡在訓練數據上損失函數儘可能的小.

需要注意的是，梯度下降法並不能保證被優化的函數達到全局最優解。
圖示，優化點陷入局部最優解，而不是全局最優。可見在訓練神經網絡時，參數的初始值會很大程度影響最後得到的結果.

梯度下降法的計算時間太長。因爲要在全部的訓練數據上最小化損失，所以損失函數J(θ)是所有訓練數據的損失和。在海量數據下，計算全部訓練數據上的損失函數是非常耗時的。

爲了加速訓練過程，可以使用隨機梯度下降法(stochastic gradient descent)。這個算法是在每一輪迭代中，隨機優化某一條訓練數據上的損失函數。這樣速度就大大加快了。同時這方法的問題也很明顯:使用隨機梯度下降法可能連局部最優也達不到。

這裏採用折中的辦法:每次計算一小部分訓練數據的損失函數(即一個batch)，通過矩陣運算。每次一個batch上優化神經網絡參數速度並不會太慢，這樣收斂速度得到的保證，收斂結果也接近梯度下降的效果。

下面代碼給出了TensorFlow中如何實現神經網絡的大致訓練過程:

#定義訓練數據batch大小
    batch_size = n


    #每次讀取一小部分
    x = tf.placeholder(tf.float32,shape=(batch_size,2),name='x-input')
    y_ = tf.placeholder(tf.float32,shape=(batch_size,1),name='y-input')


    #定義神經網絡結構和優化算法
    loss =...

    # 定義學習率
    learning_rate = 0.001

    # 定義BP算法優化神經網絡參數
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)

    #訓練網絡
    with tf.Session() as sess:
        #參數初始化等
        sess.run(tf.initialize_all_variables())

        #迭代更新參數
        STEPS = 5000
        for i in range(STEPS):
            #每次選取batch_size個樣本訓練,一般將所有訓練數據打亂後選取
            current_X,current_Y = ...
            sess.run(train_step,feed_dict={x:X[start:end], y_:Y[start:end]})

學習率的設置

學習率決定參數每次更新的幅度，如果幅度過大，可能導致參數在最優值兩側來回移動。如果幅度過小，會大大降低優化速度。爲了解決這個問題，TensorFlow提供了一種更加靈活的學習率設置方法–指數衰減法。使用以下函數

tf.train.exponential_decay(
    learning_rate,
    global_step,
    decay_steps,
    decay_rate,
    staircase=False,
    name=None
)

參數含義:

learning_rate: A scalar float32 or float64 Tensor or a Python number. The initial learning rate.(初始學習率)
global_step: A scalar int32 or int64 Tensor or a Python number. Global step to use for the decay computation. Must not be negative.
decay_steps: A scalar int32 or int64 Tensor or a Python number. Must be positive. See the decay computation above.(衰減速度)
decay_rate: A scalar float32 or float64 Tensor or a Python number. The decay rate.(衰減係數)
staircase: Boolean. If True decay the learning rate at discrete intervals
name: String. Optional name of the operation. Defaults to ‘ExponentialDecay’.

函數功能:
The function returns the decayed learning rate. It is computed as:

decayed_learning_rate = learning_rate *
                    decay_rate ^ (global_step / decay_steps)

如果參數staircase爲True，global_step / decay_steps 結果會取整，此時學習率成爲階梯函數(staircase function).
下圖連續的學習率曲線是staircase爲False，階梯曲線是staircase爲True.

應用示例:
Example: decay every 100000 steps with a base of 0.96:

global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 0.1
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                               100000, 0.96, staircase=True)
    # Passing global_step to minimize() will increment it at each step.
    learning_step = (
        tf.train.GradientDescentOptimizer(learning_rate)
        .minimize(...my loss..., global_step=global_step)

過擬合問題

過度擬合訓練數據中的隨機噪聲雖然可以得到非常小的損失函數，但是對未知數據可能無法做出可靠的判斷.如下圖:

使用TensorFlow可以優化任意形式的損失函數，以下代碼給出了一個簡單的帶L2正則化的損失函數定義:

#tensorflow.contrib.layers模塊需要導入
    import  tensorflow.contrib.layers as tflayers

    w = tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
    y = tf.matmul(x,w)

    loss = tf.reduce_mean(tf.square(y_ - y)) +                    tflayers.l2_regularizer(lambda)(w)  
    #lambda爲正則化權重  實際過程中lambda爲關鍵字

loss定義爲損失函數，由兩個部分組成，第一個部分是均方誤差損失函數，刻畫模型在訓練數據上的表現。第二部分就是正則化，防止模型過度模擬訓練數據中的隨機噪聲.

類似的,tensorflow.contrib.layers.l1_regularizer可以計算L1正則化的值。

在簡單的神經網絡中，上述代碼可以很好地計算帶正則化的損失函數，但當神經網絡的參數增多之後，這樣的方式可能導致loss函數定義可讀性變差，更主要的是導致，網絡結構複雜之後定義網絡結構的部分和計算損失函數的部分可能不在同一函數中，這樣通過變量這樣方式計算損失函數就不方便了.

以下代碼使用TensorFlow中給提供的集合(Collection)解決一個5層神經網絡帶L2正則化的損失函數計算方法:

# coding=utf-8
    # tensorflow中集合的運用:損失集合
    # 計算一個5層神經網絡帶L2正則化的損失函數
    import tensorflow as tf
    import  tensorflow.contrib.layers as tflayers
    from numpy.random import RandomState

    #獲得一層神經網絡邊上的權重，並將這個權重的L2 正則化損失加入名稱爲'losses'的集合裏
    def get_weight(shape, lamada):
        # 生成對應一層的權重變量
        var = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
        tf.add_to_collection('losses', tflayers.l2_regularizer(lamada)(var))
        return var

    x = tf.placeholder(tf.float32, shape=(None, 2), name='x_input')
    y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y_input')

    batch_size = 8

    # 定義每層神經網絡的節點個數
    layer_dimension = [2, 10, 10, 10, 1]
    # 獲取神經網絡的層數
    n_layers = len(layer_dimension)
    # 這個變量表示前向傳播時最深層的節點，最開始的時候是輸入層
    cur_layer = x
    # 當前層的節點個數
    in_dimension = layer_dimension[0]

    # 通過一個循環生成5層全連接的神經網絡結構
    for i in range(1, n_layers):
        # 獲取下一層節點的個數
        out_dimension = layer_dimension[i]
        # 獲取當前計算層的權重並加入了l2正則化損失
        weight = get_weight([in_dimension, out_dimension], 0.001)
        # 隨機生成偏向
        bias = tf.Variable(tf.constant(0.1, shape=[out_dimension]))
        # 計算前向傳播節點，使用RELU激活函數
        cur_layer = tf.nn.relu(tf.matmul(cur_layer, weight) + bias)
        # 進入下一層之前，更新下一層節點的輸入節點數
        in_dimension = layer_dimension[i]

    # 計算模型數據的均值化損失加入損失集合
    mse_loss = tf.reduce_mean(tf.square(y_ - cur_layer))
    tf.add_to_collection('losses', mse_loss)

    # get_collection返回一個列表，列表是所有這個集合的所有元素
    # 在本例中，元素代表了其他的損失，加起來就得到了所有的損失
    loss = tf.add_n(tf.get_collection('losses'))

    global_step = tf.Variable(0)
    # 學習率的設置：指數衰減法，參數：初始參數，全局步驟，每訓練100輪乘以衰減速度0,96(當staircase=True的時候)
    learning_rate = tf.train.exponential_decay(0.1, global_step, 100, 0.96, staircase=True)
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

    rdm = RandomState(1)
    dataset_size = 128
    X = rdm.rand(dataset_size, 2)
    # 加入了一個噪音值，-0.05～0.05之間
    Y = [[x1 + x2 + rdm.rand() / 10.0 - 0.05] for (x1, x2) in X]

    with tf.Session() as sess:
        init_op = tf.initialize_all_variables()
        sess.run(init_op)
        # print sess.run(w1)

        steps = 5000
        for i in range(steps):
            start = (i * batch_size) % dataset_size
            end = min(start + batch_size, dataset_size)

            sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
            if i % 1000 == 0:
                total_loss = sess.run(
                    loss, feed_dict={x: X, y_: Y})
                print("After %d training_step(s) ,loss on all data is %g" % (i, total_loss))
                # print sess.run(w1)

滑動平均模型

在採用隨機梯度下降算法訓練神經網絡時，使用滑動平均算法模型可以提供模型的魯棒性(robust).TensorFlow中提供了tf.train.ExponentialMovingAverage來實現滑動平均模型.

Maintains moving averages of variables by employing an exponential decay.

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

The apply() method adds shadow copies of trained variables and add ops that maintain a moving average of the trained variables in their shadow copies. It is used when building the training model. The ops that maintain moving averages are typically run after each training step. The average() and average_name() methods give access to the shadow variables and their names. They are useful when building an evaluation model, or when restoring a model from a checkpoint file. They help use the moving averages in place of the last trained values for evaluations.


通過指數衰減完成滑動平均計算，創建一個ExponentialMovingAverage對象時要指定衰減率(decay).衰減率用於控制模型更新速度。每一個變量會有一個shadow_variables，這個shadow_variables初始值就是對應變量的初始值。shadow_variables的更新公式如下:

shadow_variable -= (1 - decay) * (shadow_variable - variable)

decay設置值應該接近於1.0（ 例如0.999, 0.9999）

Example usage when creating a training model:

#coding=utf-8
    #滑動平均模型的小程序
    #滑動平均模型可以使得模型在測試數據上更加健壯
    import tensorflow as tf

    #定義一個變量用以計算滑動平均，變量的初始值爲0,手動指定類型爲float32，
    #因爲所有需要計算滑動平均的變量必須是實數型
    v1 = tf.Variable(0,dtype=tf.float32)

    #模擬神經網絡迭代的輪數，動態控制衰減率
    step = tf.Variable(0,trainable=False)
    #定義一個滑動平均的類，初始化時給定衰減率爲0.99和控制衰減率的變量
    ema = tf.train.ExponentialMovingAverage(0.99,step)

    #定義一個滑動平均的操作，這裏需要給定一個列表，每次執行這個操作時，列表裏的元素都會被更新
    maintain_average_op = ema.apply([v1])

    with tf.Session() as sess:
        #初始化所有變量
        init_op = tf.initialize_all_variables()
        sess.run(init_op)

        #獲取滑動平均之後變量的取值
        print sess.run([v1,ema.average(v1)])

        #更新v1的值爲5
        sess.run(tf.assign(v1,5))
        #更新v1的滑動平均值，衰減率爲min{0.99,(1+step)/(10+step)=0.1}=0.1,
        #所以v1的滑動平均被更新爲0.1*0+0.9*5=4.5
        sess.run(maintain_average_op)
        print sess.run([v1,ema.average(v1)])

        #更新迭代的輪數
        sess.run(tf.assign(step,10000))
        sess.run(tf.assign(v1,10))
        #這裏的衰減率變成0.99
        #v1 = 0.99*4.5+0.01*10=4.555
        sess.run(maintain_average_op)
        print sess.run([v1,ema.average(v1)])

        #再次更新滑動平均值
        sess.run(maintain_average_op)
        print sess.run([v1,ema.average(v1)])

輸出

    [0.0, 0.0]
    [5.0, 4.5]
    [10.0, 4.5549998]
    [10.0, 4.6094499]

TensorFlow實現Softmax Regression 識別手寫數字

MNIST(Mixed National Institute of Standards and Technology database)是一個非常有名的機器視覺數據集，由幾萬張28x28像素的手寫數字組成，這些圖片只包含灰度值。我們的任務就是對這些圖片分成數字0~9類。

下載和加載數據：

    from tensorflow.examples.tutorials.mnist import input_data
    mnist = input_data.read_data_sets('MNIST_data/',one_hot=True)

    #download...
    Extracting MNIST_data/train-images-idx3-ubyte.gz
    Extracting MNIST_data/train-labels-idx1-ubyte.gz
    Extracting MNIST_data/t10k-images-idx3-ubyte.gz
    Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

查看數據集：

在MNIST數據集中，mnist.train.images是一個形狀爲[60000,784]的張量，第一個維度數字用來索引圖片，第二個維度數字用來索引每張圖片中的像素點。在此張量裏的每一個元素，都表示某張圖片裏的某個像素的強度值，值介於0和1之間。

訓練集

 >>> print(mnist.train.images.shape,mnist.train.labels.shape)
        ((55000, 784), (55000, 10))

其中訓練集有55000個樣本，是一個55000x784的Tensor.第一個維度是圖片的編號，第二個維度是圖片中像素點的編號。

訓練集的Label是一個55000x10的Tensor，對10個種類的one-hot編碼，即對應n位爲1代表數值爲n.

類似的測試集和校驗集一樣

測試集和校驗集

    print(mnist.test.images.shape,mnist.test.labels.shape)
    ((10000, 784), (10000, 10))
    >>> print(mnist.validation.images.shape,mnist.validation.labels.shape)
    ((5000, 784), (5000, 10))

選取數據

input_data.read_data_sets函數生成的類提供了mnist.train.next_batch函數，可以從所有的訓練數據中讀取一小部分作爲一個訓練batch.

    from tensorflow.examples.tutorials.mnist import input_data
    mnist = input_data.read_data_sets('/path/to/MNIST_data',one_hot=True)

    batch_size = 100
    xs,ys = mnist.train.next_batch(batch_size)
    print xs.shape
    >>> (100, 784)
    print ys.shape
    >>> (100, 10)

設計算法

Softmax Regression簡介

處理多分類任務時，通常使用Softmax Regression模型。
在神經網絡中，如果問題是分類模型(即使是CNN或者RNN)，一般最後一層是Softmax Regression。
它的工作原理是將可以判定爲某類的特徵相加，然後將這些特徵轉化爲判定是這一類的概率。

實現Softmax Regression

創建一個神經網絡模型步驟如下:

定義網絡結構(即網絡前向算法)
定義loss function，確定Optimizer
迭代訓練
在測試集/驗證集上測評

1. 定義網絡結構

    import tensorflow as tf

    sess = tf.InteractiveSession() #註冊默認Session

    #輸入數據佔位符,None代表輸入條數不限制(取決訓練的batch_size)
    x = tf.placeholder("float", [None, 784]) 

    W = tf.Variable(tf.zeros([784,10])) #權重張量，weights無隱藏層
    b = tf.Variable(tf.zeros([10])) #偏置biases

    #實現softmax Regression  y=softmax(Wx+b)
    y = tf.nn.softmax(tf.matmul(x,W) + b)

2. 定義loss function，確定Optimizer

    #y_爲標籤值
    y_ = tf.placeholder("float", [None,10])

    #交叉熵損失函數定義
    cross_entropy = -tf.reduce_mean(tf.reduce_sum(y_*tf.log(y)))

    #學習率定義
    learn_rate = 0.001

    #優化器選擇
    train_step = tf.train.GradientDescentOptimizer(learn_rate).minimize(cross_entropy)

3. 迭代訓練

with Session() as sess:
        #初始化所有變量
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        #迭代次數
        STEPS = 1000
        for i in range(STEPS):
            #使用mnist.train.next_batch隨機選取batch
            batch_xs, batch_ys = mnist.train.next_batch(100)
            sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

4. 在測試集/驗證集上測評

#tf.argmax函數可以在一個張量裏沿着某條軸的最高條目的索引值
    #tf.argmax(y,1) 是模型認爲每個輸入最有可能對應的那些標籤
    #而 tf.argmax(y_,1) 代表正確的標籤
    #我們可以用 tf.equal 來檢測我們的預測是否真實標籤匹配
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    #這行代碼會給我們一組布爾值。


    #爲了確定正確預測項的比例,我們可以把布爾值轉換成浮點數,然後取平均值。例如, [True, False, True, True] 會變成 [1,0,1,1] ,取平均值後得到 0.75
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    #我們計算所學習到的模型在測試數據集上面的正確率
    print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

下面給出一個完整的TensorFlow訓練神經網絡

這裏會用到激活函數去線性化，使用更深層網絡，使用帶指數衰減的學習率設置，同時使用正則化避免過擬合，以及使用滑動平均模型來使最終模型更加健壯。

# coding=utf-8
    # 在MNIST 數據集上實現神經網絡
    # 包含一個隱層
    # 5種優化方案：激活函數，多層隱層，指數衰減的學習率，正則化損失，滑動平均模型

    import tensorflow as tf
    import tensorflow.contrib.layers as tflayers
    from tensorflow.examples.tutorials.mnist import input_data


    #MNIST數據集相關參數
    INPUT_NODE = 784  #輸入節點數
    OUTPUT_NODE = 10 #輸出節點數

    LAYER1_NODE = 500 #選擇一個隱藏層,節點數爲500
    BATCH_SIZE = 100 #一個batch大小


    '''
    指數衰減學習率
    函數定義exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None):
        計算公式:decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
            learning_rate = LEARNING_RATE_BASE;   
            decay_rate = LEARNING_RATE_DECAY
            global_step=TRAINING_STEPS;
            decay_steps = mnist.train.num_examples/batch_size
    '''
    LEARNING_RATE_BASE = 0.8  # 基礎的學習率，使用指數衰減設置學習率
    LEARNING_RATE_DECAY = 0.99  # 學習率的初始衰減率


    # 正則化損失的係數
    LAMADA = 0.0001
    # 訓練輪數
    TRAINING_STEPS = 30000
    # 滑動平均衰減率
    MOVING_AVERAGE_DECAY = 0.99




    def get_weight(shape, llamada):
        '''
        function:生成權重變量，並加入L2正則化損失到losses集合裏

        :param shape:   權重張量維度
        :param llamada: 正則參數
        :return:    權重張量(所有權重都添加在losses集合中,簡化計算loss操作)

        tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)
        從截斷的正態分佈中輸出隨機值
        生成的值服從具有指定平均值和標準偏差的正態分佈，如果生成的值大於平均值2個標準偏差的值則丟棄重新選擇。
            :shape: 一維的張量，也是輸出的張量
            :mean: 正態分佈的均值。
            :stddev: 正態分佈的標準差。
            :dtype: 輸出的類型。
            :seed: 一個整數，當設置之後，每次生成的隨機數都一樣。
            :name: 操作的名字。

        '''
        weights = tf.Variable(tf.truncated_normal(shape, stddev=0.1))
        if llamada != None:
            tf.add_to_collection('losses', tflayers.l2_regularizer(llamada)(weights))
        return weights


    def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
        '''
        對神經網絡進行前向計算，如果avg_class爲空計算普通的前向傳播，否則計算包含滑動平均的前向傳播
        使用了RELU激活函數實現了去線性化
        :param input_tensor: 輸入張量
        :param avg_class:   平均滑動類
        :param weights1:    一級層權重
        :param biases1:     一級層偏置
        :param weights2:    二級層權重
        :param biases2:     二級層權重
        :return:    前向傳播的計算結果(默認隱藏層一層,所以輸出層沒有ReLU)

        計算輸出層的前向傳播結果。
        因爲在計算損失函數的時候會一併計算softmax函數，因此這裏不加入softmax函數
        同時，這裏不加入softmax層不會影響最後的結果。
        因爲，預測時使用的是不同類別對應節點輸出值的相對大小，因此有無softmax層對最後的結果沒有影響。
        '''
        if avg_class == None:
            layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
            return tf.matmul(layer1, weights2) + biases2
        else:
            # 首先需要使用avg_class.average函數計算變量的滑動平均值，然後再計算相應的神經網絡前向傳播結果
            layer1 = tf.nn.relu(
                tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))
            return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)


    # 訓練模型的過程
    def train(mnist):
        '''
        訓練函數過程:
            1.定義網絡結構,計算前向傳播結果
            2.定義loss和優化器
            3.迭代訓練
            4.評估訓練模型

        :param mnist: 數據集合
        :return:
        '''

        x = tf.placeholder(tf.float32, shape=(None, INPUT_NODE), name='x_input')
        y_ = tf.placeholder(tf.float32, shape=(None, OUTPUT_NODE), name='y_input')

        # 生成隱藏層(使用get_weight帶L2正則化)
        weights1 = get_weight([INPUT_NODE, LAYER1_NODE], LAMADA)
        biaes1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))
        # 生成輸出層的參數
        weights2 = get_weight([LAYER1_NODE, OUTPUT_NODE], LAMADA)
        biaes2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))
        # 計算神經網絡的前向傳播結果，注意滑動平均的類函數爲None
        y = inference(x, None, weights1, biaes1, weights2, biaes2)
        # 定義存儲模型訓練輪數的變量，並指明爲不可訓練的參數
        global_step = tf.Variable(0, trainable=False)

        '''
        使用平均滑動模型
            1.初始化滑動平均的函數類，加入訓練輪數的變量可以加快需年早期變量的更新速度
            2.對神經網絡裏所有可訓練參數（列表）應用滑動平均模型，每次進行這個操作，列表裏的元素都會得到更新
            3.計算使用了滑動平均的網絡前向傳播結果，滑動是維護影子變量來記錄其滑動平均值,需要使用時要明確調用average函數
        '''
        variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
        variable_averages_op = variable_averages.apply(tf.trainable_variables())
        average_y = inference(x, variable_averages, weights1, biaes1, weights2, biaes2)

        '''
        定義loss
            當只有一個標準答案的時候，使用sprase_softmax_cross_entropy_with_logits計算損失，可以加速計算
                參數：不包含softma層的前向傳播結果，訓練數據的正確答案
                因爲標準答案是一個長度爲10的一維數組，而該函數需要提供一個正確答案的數字
                因此需要使用tf.argmax函數得到正確答案的對應類別編號
        '''
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
        # 計算在當前batch裏所有樣例的交叉熵平均值，並加入損失集合
        cross_entropy_mean = tf.reduce_mean(cross_entropy)
        tf.add_to_collection('losses', cross_entropy_mean)
        # get_collection返回一個列表，列表是所有這個集合的所有元素(在本例中，元素代表了其他部分的損失，加起來就得到了所有的損失)
        loss = tf.add_n(tf.get_collection('losses'))

        '''
        設置指數衰減的學習率
            使用GradientDescentOptimizer()優化算法的損失函數
        '''
        learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,  # 基礎的學習率，在此基礎上進行遞減
                                                   global_step,  # 迭代的輪數
                                                   mnist.train.num_examples / BATCH_SIZE,  # 所有的數據得到訓練所需要的輪數
                                                   LEARNING_RATE_DECAY)  # 學習率衰減速度
        train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)


        '''
        在訓練神經網絡模型的時候，每過一次數據既需要BP更新參數又要更新參數的滑動平均值。
        爲了一次完成多種操作，tensroflow提供了兩種機制:tf.control_dependencies和tf.group

        下面的兩行程序和：train_op = tf.group(train_step,variables_average_op)等價

        tf.group(*inputs, **kwargs )
            Create an op that groups multiple operations.
            When this op finishes, all ops in input have finished. This op has no output.


        control_dependencies(control_inputs)
            Use with the with keyword to specify that all operations constructed within 
            the context should have control dependencies on control_inputs. 
        For example:
            with g.control_dependencies([a, b, c]):
             # `d` and `e` will only run after `a`, `b`, and `c` have executed.
              d = ...
              e = ...

        '''
        with tf.control_dependencies([train_step, variable_averages_op]):
            train_op = tf.no_op(name='train')

        '''
            進行驗證集上的準確率計算，這時需要使用滑動平均模型
            判斷兩個張量的每一維是否相等，如果相等就返回True,否則返回False
            這個運算先將布爾型的數值轉爲實數型，然後計算平均值，平均值就是準確率
        '''
        correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))




        with tf.Session() as sess:
            # init_op = tf.global_variables_initializer()  sess.run(init_op) 這種寫法可視化更加清晰
            tf.global_variables_initializer().run()
            # 準備驗證數據，一般在神經網絡的訓練過程中會通過驗證數據來判斷大致停止的條件和評判訓練的效果
            validate_feed = {x: mnist.validation.images, y_: mnist.validation.labels}
            # 準備測試數據，在實際中，這部分數據在訓練時是不可見的，這個數據只是作爲模型優劣的最後評價標準
            test_feed = {x: mnist.test.images, y_: mnist.test.labels}
            # 迭代的訓練神經網絡
            for i in range(TRAINING_STEPS):
                xs, ys = mnist.train.next_batch(BATCH_SIZE)
                _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: xs, y_: ys})
                if i % 1000 == 0:
                    print("After %d training step(s), loss on training batch is %g." % (step, loss_value))

                    validate_acc = sess.run(accuracy, feed_dict=validate_feed)
                    print "After %d training step(s),validation accuracy using average model is %g " % (step, validate_acc)
                    test_acc = sess.run(accuracy, feed_dict=test_feed)
                    print("After %d training step(s) testing accuracy using average model is %g" % (step, test_acc))


    #TensorFlow主程序入口
    def main(argv=None):
        mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
        train(mnist)


    #TensorFlow提供了一個主程序入口,tf.app.run會調用上面定義的main函數
    if __name__ == '__main__':
        tf.app.run()

輸出

After 1 training step(s), loss on training batch is 3.30849.
    After 1 training step(s),validation accuracy using average model is 0.1098 
    After 1 training step(s) testing accuracy using average model is 0.1132
    After 1001 training step(s), loss on training batch is 0.19373.
    After 1001 training step(s),validation accuracy using average model is 0.9772 
    After 1001 training step(s) testing accuracy using average model is 0.9738
    After 2001 training step(s), loss on training batch is 0.169665.
    After 2001 training step(s),validation accuracy using average model is 0.9794 
    After 2001 training step(s) testing accuracy using average model is 0.9796
    After 3001 training step(s), loss on training batch is 0.147636.
    After 3001 training step(s),validation accuracy using average model is 0.9818 
    After 3001 training step(s) testing accuracy using average model is 0.9813
    After 4001 training step(s), loss on training batch is 0.129015.
    After 4001 training step(s),validation accuracy using average model is 0.9808 
    After 4001 training step(s) testing accuracy using average model is 0.9825
    After 5001 training step(s), loss on training batch is 0.109033.
    After 5001 training step(s),validation accuracy using average model is 0.982 
    After 5001 training step(s) testing accuracy using average model is 0.982
    After 6001 training step(s), loss on training batch is 0.108935.
    After 6001 training step(s),validation accuracy using average model is 0.9818 
    After 6001 training step(s) testing accuracy using average model is 0.982
.......
.......
.......
    After 27001 training step(s), loss on training batch is 0.0393247.
    After 27001 training step(s),validation accuracy using average model is 0.9828 
    After 27001 training step(s) testing accuracy using average model is 0.9827
    After 28001 training step(s), loss on training batch is 0.0422536.
    After 28001 training step(s),validation accuracy using average model is 0.984 
    After 28001 training step(s) testing accuracy using average model is 0.9822
    After 29001 training step(s), loss on training batch is 0.0512684.
    After 29001 training step(s),validation accuracy using average model is 0.9832 
    After 29001 training step(s) testing accuracy using average model is 0.9831

tensorflow入門

TensorFlow計算模型-計算圖

計算圖的簡單示例

計算圖的使用

創建新的計算圖

管理計算圖等資源

圖相關的api函數

Tensorflow數據模型-張量

張量結構

name屬性

shape屬性

type屬性

TensorFlow中types相關屬性

在TensorFlow中有14種不同的類型,見下表

type的api（class tf.DType）

張量相關api

會話(Session)

會話模式

1.會話模式–明確調用會話生成函數和關閉會話函數

2.會話模式–通過Python上下文管理器

默認會話

tf.InteractiveSession相關api（class tf.InteractiveSession）

配置會話屬性

第一個參數:allow_soft_placement

第二個參數:log_device_placement

會話相關api（tf.Session）

TensorFlow變量

TensorFlow隨機數生成函數

變量管理

Variable的api(tf.Variable)

深度神經網絡

TensorFlow神經網絡介紹

TensorFlow遊樂場

前向傳播算法介紹

通過變量實現神經網絡前向傳播過程

通過TensorFlow訓練神經網絡模型

完整的神經網絡樣例程序

深層神經網絡

去線性化

多層

損失函數

經典損失函數

交叉熵

熵 熵的本質是香農信息量的期望。

分類問題-交叉熵

TensorFlow中的交叉熵實現

迴歸問題-均方誤差(MSE,mean squared error)

自定義損失函數

神經網絡優化算法

學習率的設置

過擬合問題

滑動平均模型

TensorFlow實現Softmax Regression 識別手寫數字

下載和加載數據：

查看數據集：

訓練集

測試集和校驗集

選取數據

設計算法

Softmax Regression簡介

實現Softmax Regression

1. 定義網絡結構

2. 定義loss function，確定Optimizer

3. 迭代訓練

4. 在測試集/驗證集上測評

下面給出一個完整的TensorFlow訓練神經網絡

判斷模型效果

熵熵的本質是香農信息量的期望。