TensorFlow學習筆記--Deep Dream模型

零、目標

Deep Dream是谷歌推出的一個有意思的技術。在訓練好的CNN上，設定幾個參數就可以生成一張圖象。具體目標是：

瞭解Deep Dream基本原理
掌握實現生成Deep Dream 模型

一、技術原理

在卷積網絡中，通常輸入的是一張圖象，經過若干層的卷積運算，最終輸出圖像的類別。這期間使用到了圖片計算梯度，網絡根據梯度不斷的調整和學習最佳的參數。但是卷積層究竟學習到了什麼，卷積層的參數代表了什麼，淺層卷積和深層卷積學習到的內容有哪些區別，這些問題Deep Dream可以解答。
假設輸入網絡的圖像爲X，網絡輸出的各個類別的概率爲t（t是一個多維向量，代表了多種類別的概率）。設定t[N]爲優化目標，不斷的讓神經網絡去調整輸入圖像X的像素值，讓輸出t[N]儘可能的大，最後極大化第N類別的概率得到圖片。
關於卷積層究竟學到了什麼，只需要最大化卷積層的某一個通道數據就可以了。折輸入的圖像爲X，中間某個卷積層的輸出是Y，Y的形狀是hwc，其中h爲Y的高度，w爲Y的寬度，c爲通道數。卷積的一個通道就可以代表一種學習到的信息。以某一個通道的平均值作爲優化目標，就可以弄清楚這個通道究竟學習到了什麼，這也是Deep Dream的基本原理。

二、在TensorFlow中使用

導入Inception模型
原始的Deep Dream 模型只需要優化ImageNet 模型卷積層某個通道的激活值就可以。因此，應該先導入ImageNet圖像識別模型，這裏以 Inception 爲例。創建 load_inception.py 文件，輸入如下代碼：

# 導入基本模塊
import numpy as np
import tensorflow as tf

# 創建圖和會話
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)

# 導入Inception模型
# tensorflow_inception_graph.pb 文件存儲了inception的網絡結構和對應的數據
model_fn = 'tensorflow_inception_graph.pb'
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())


# 導入的時候需要給網絡制定一個輸入圖像，因此定義一個t_input作爲佔位符
# 需要注意的是，使用的圖像數據通常的格式爲：(height,width,channel)，
# 其中height爲圖像的像素高度，width爲圖像的像素寬度，chaneel爲圖像的通道數，一般使用RGB圖像，所以通道數爲3
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0

# 處理輸入圖像
# 雖然圖像的格式是(height,width,channel)，但是Inception模型所需的輸入格式是(batch,height,width,channel)
# 這是因爲(height,width,channel)只能表示一張圖片，但在訓練神經網絡時往往需要多張圖片
# 因此在前面加了一維，讓輸入的圖片符合Inception需要的格式
# 儘管這裏一次只需要輸入一張圖片,但是同樣也需要將數據變爲Inception所需的格式，只不過這裏的batch等於1
# 對圖像減去一個像素均值
# 原因是在訓練Inception 模型的時候，已經做了減去均值的預處理，因此這裏使用同樣的方法處理，才能保持輸入一致
# t_input-imagenet_mean 減去均值，這裏使用的Inception模型減去的是一個固定均值117，所以這裏也減去117
# expand_dims 執行加一維操作，從[height,width,channel] 變爲[1,height,width,channel]
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)

# 導入模型
tf.import_graph_def(graph_def, {'input': t_preprocessed})

# 找到所有的卷積層
layers = [op.name for op in graph.get_operations() if op.type == 'Conv2D' and 'import/' in op.name]

# 輸出卷積層層數
print('Number of layers', len(layers))

# 輸出mixed4d_3x3_bottleneck_pre_relu 形狀
name = 'mixed4d_3x3_bottleneck_pre_relu'
print('shape of %s: %s' % (name, str(graph.get_tensor_by_name('import/' + name + ':0').get_shape())))

這段代碼運行後，會輸出卷積層總數是59個

注1：
在輸出卷積層“mixed4d_3x3_bottleneck_pre_relu”的形狀時，輸出的結果是(?,?,?,144)，原因是此時還不清楚輸入圖像的個數以及大小，所以前三維的值不確定

生成原始圖像
以 mixed4d_3x3_bottleneck_pre_relu 卷積層爲例，最大化它的某一個通道的平均值，以達到生成圖像的目的。
創建 gen_naive.py 文件，導入Inception模型，導入方法同上節。首先定義保存圖片的函數：

def savearray(img_array, img_name):
    scipy.misc.toimage(img_array).save(img_name)
    print('img saved : %s' % img_name)

接着創建程序的主要部分：

# 定義卷積層、通道數，並去除對應的Tensor
name = 'mixed4d_3x3_bottleneck_pre_relu'
# 選擇任意的通道，這裏是139
channel = 139
# 取出 mixed4d_3x3_bottleneck_pre_relu 卷積層的輸出層
layer_output = graph.get_tensor_by_name("import/%s:0" % name)

# 定義原始的圖像噪聲
# 他是一個形狀爲（224，224，3）的張量，表示初始化圖像優化起點
img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0

# 調用 render_navie 函數渲染
render_naive(layer_output[:, :, :, channel], img_noise, iter_n=20)

最後定義渲染函數

def render_naive(t_obj, img0, iter_n=20, step=1.0):
    '''
    渲染函數
    :param t_obj:卷積層某個通道的值
    :param img0:初始化圖像
    :param iter_n:迭代的步數
    :param step:
    :return:
    '''
    # t_score 是優化目標。他是t_obj的平均值
    # t_score 越大，就說明神經網絡卷積層對應的通道的平均激活越大
    t_score = tf.reduce_mean(t_obj)
    # 計算t_score對t_input的梯度
    # 代碼的目標是通過調整輸入圖像 t_input ，來讓 t_score 儘可能的大
    # 因此使用體服下降法
    t_grad = tf.gradients(t_score, t_input)[0]

    # 創建新圖
    img = img0.copy()
    # 迭代 iter_n 每一步都將梯度應用到圖像上
    for i in range(iter_n):
        # 在sess中計算梯度，以及當前的score
        g, score = sess.run([t_grad, t_score], {t_input: img})
        # 對img應用梯度，step可以看作學習率
        g /= g.std() + 1e-8
        img += g * step
        print('score(mean)=%f' % (score))

    savearray(img, 'navie.jpg')

運行程序後，將得到20次迭代後的圖像，如下圖

生成大尺寸圖片
上節生成的圖片尺寸太小，這節通過代碼，將生成的大尺寸的圖片。上節中傳遞圖片尺寸的參數是 img_noise ，如果 img_noise 傳遞更大的值，那麼生成的圖片尺寸就會更大。但是這樣就出現一個問題，生成圖片的過程是需要消耗內存/顯存的，img_noise 傳遞的尺寸越大，消耗的內存/顯存就越多，最終會因爲內存/顯存不足，導致渲染失敗。如何解決這個問題呢，其實很簡單，每次不對整張圖片做優化，而是把圖片分爲幾個部分，每次只對一部分做優化，這樣消耗的內存/顯存就是固定大小的。
新建 gen_multiscale.py 文件，寫入如下代碼，這個函數可以對任意大小的圖像進行提督計算：

def calc_grad_tiled(img, t_grad, title_size=512):
    '''
    對任意大小的圖像計算梯度
    :param img:
    :param t_grad:
    :param title_size:每次優化的大小
    :return:
    '''
    # 每次只對title_size*title_size大小的圖像計算梯度
    sz = title_size
    h, w = img.shape[:2]

    # 如果直接計算梯度，在每個 title_size * title_size 的邊緣會出現比較明顯的邊緣效應，影響美觀
    # 解決的辦法是：生成兩個隨機數 sx、sy，對圖片進行整體移動
    # img_shift 先在行上做整體移動，再在列上做整體移動
    # 防止出現邊緣效應
    sx, sy = np.random.randint(sz, size=2)
    img_shift = np.roll(np.roll(img, sx, 1), sy, 0)
    grad = np.zeros_like(img)
    # y,x是開始及位置的像素
    for y in range(0, max(h - sz // 2, sz), sz):
        for x in range(0, max(w - sz // 2, sz), sz):
            # 每次對sub計算梯度。sub的大小是title_size*title_size
            sub = img_shift[y:y + sz, x:x + sz]
            g = sess.run(t_grad, {t_input: sub})
            grad[y:y + sz, x:x + sz] = g

        # 使用np.roll移回去
        return np.roll(np.roll(grad, -sx, 1), -sy, 0)

爲了加快圖像的收斂速度，可以採用先生成小尺寸，再將圖片放大：

# 將圖片放大ratio倍
def resize_ratio(img, ratio):
    # 首先確定源像素的範圍
    min = img.min()
    max = img.max()
    img = (img - min) / (max - min) * 255
    img = np.float32(scipy.misc.imresize(img, ratio))
    # 使用完 scipy.misc.imresize 函數後，將像素縮放回去
    img = img / 255 * (max - min) + min
    return img

# 生成大尺寸圖片
def render_multiscale(t_obj, img0, iter_n=10, step=1.0, octave_n=3, octave_scale=1.4):
    '''
    生成大尺寸圖片
    :param t_obj:
    :param img0:
    :param iter_n:
    :param step:
    :param octave_n:放大次數
    :param octave_scale:放大倍數
    :return:
    '''
    # 同樣定義目標梯度
    t_score = tf.reduce_mean(t_obj)
    t_grad = tf.gradients(t_score, t_input)[0]

    img = img0.copy()
    # 先生成小尺寸圖像
    # 然後調用 resize_ratio 將小尺寸圖像放大 octave_scale 倍
    # 再使用放大後的圖像作爲初始值進行計算
    for octave in range(octave_n):
        if octave > 0:
            # 每次將圖片放大octave_scale倍
            # 共放大octave_n-1次
            img = resize_ratio(img, octave_scale)
        for i in range(iter_n):
            # 計算任意大小圖像的梯度
            g = calc_grad_tiled(img, t_grad)
            g /= g.std() + 1e-8
            img += g * step
            print('.', end=' ')
    savearray(img, 'multiscale.jpg')

下面編寫主內容

if __name__ == '__main__':
    name = 'mixed4d_3x3_bottleneck_pre_relu'
    channel = 139
    img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0
    layer_output = graph.get_tensor_by_name("import/%s:0" % name)
    render_multiscale(layer_output[:, :, :, channel], img_noise, iter_n=20)

運行代碼後，將生成一張大尺寸的圖片，如下圖：

從圖中可以看出，mixed4d_3x3_bottleneck_pre_relu 卷積層的第139個通道實際上就是學到了某種花朵的特徵。

生成高質量圖片
前面兩節生成的圖片都是分辨率不高的圖片，這節將生成高質量的圖片。在圖像處理算法中，有 高頻成分 和 低頻成分 之分。所謂高頻成分，是指圖像中灰度、顏色、明度變化比較劇烈的地方，比如邊緣、細節部分。低頻成分是指圖像變化不大的地方，比如大塊色塊、整體風格。
上節生成的圖片高頻成分太多，圖片不夠柔和。如何解決這個問題呢？一種方法是針對高頻成分加入損失，這樣圖像在生成的時候就會因爲新加入損失的作用二發生變化，但是加入損失會導致計算量和收斂步數增大。另一種方法是 放大低頻梯度 ，對梯度進行分解，降至分爲 高頻梯度 和 低頻梯度 ，在人爲的去放大低頻梯度，就可以得到較爲柔和的圖像。
一般情況下，要使用 拉普拉斯金字塔 對圖像進行分解，這種算法可以把圖片分解爲多層。同時，也可以對梯隊進行分解，分解之後，對高頻的梯度和低頻的梯度都做標準化，可以讓梯度的低頻成分和高頻成分差不多，表現在圖像上就會增加圖像的低頻成分，從而提高生成圖像的質量。這種方法稱爲 拉普拉斯金字塔標準化，具體實現代碼如下：

k = np.float32([1, 4, 6, 4, 1])
k = np.outer(k, k)
k5x5 = k[:, :, None, None] / k.sum() * np.eye(3, dtype=np.float32)

# 這個函數將圖像分爲低頻和高頻成分
def lap_split(img):
    with tf.name_scope('split'):
        # 做一次卷積相當於一次平滑，因此lo爲低頻成分
        lo = tf.nn.conv2d(img, k5x5, [1, 2, 2, 1], 'SAME')
        # 低頻成分縮放到原始圖像大叫就得到lo2，再用原始圖像img減去lo2，就得到高頻成分hi
        lo2 = tf.nn.conv2d_transpose(lo, k5x5 * 4, tf.shape(img), [1, 2, 2, 1])
        hi = img - lo2
    return lo, hi


# 這個函數將圖像img分成n層拉普拉斯金字塔
def lap_split_n(img, n):
    levels = []
    for i in range(n):
        # 調用lap_split將圖像分爲低頻和高頻部分
        # 高頻部分保存到levels中
        # 低頻部分再繼續分解
        img, hi = lap_split(img)
        levels.append(hi)
    levels.append(img)
    return levels[::-1]


# 將拉普拉斯金字塔還原到原始圖像
def lap_merge(levels):
    img = levels[0]
    for hi in levels[1:]:
        with tf.name_scope('merge'):
            img = tf.nn.conv2d_transpose(img, k5x5 * 4, tf.shape(hi), [1, 2, 2, 1]) + hi
    return img


# 對img做標準化
def normalize_std(img, eps=1e-10):
    with tf.name_scope('normalize'):
        std = tf.sqrt(tf.reduce_mean(tf.square(img)))
        return img / tf.maximum(std, eps)


# 拉普拉斯金字塔標準化
def lap_normalize(img, scale_n=4):
    img = tf.expand_dims(img, 0)
    tlevels = lap_split_n(img, scale_n)
    # 每一層都做一個normalize_std
    tlevels = list(map(normalize_std, tlevels))
    out = lap_merge(tlevels)
    return out[0, :, :, :]

編寫完拉普拉斯標準化函數後，現在編寫生成圖像的代碼：

# 將一個Tensor函數轉換成numpy.ndarray 函數
def tffunc(*argtypes):
    placeholders = list(map(tf.placeholder, argtypes))

    def wrap(f):
        out = f(*placeholders)

        def wrapper(*args, **kw):
            return out.eval(dict(zip(placeholders, args)), session=kw.get('session'))

        return wrapper

    return wrap


def render_lapnorm(t_obj, img0,
                   iter_n=10, step=1.0, octave_n=3, octave_scale=1.4, lap_n=4):
    # 同樣定義目標和梯度
    t_score = tf.reduce_mean(t_obj)
    t_grad = tf.gradients(t_score, t_input)[0]
    # 將lap_normalize轉換爲正常函數
    lap_norm_func = tffunc(np.float32)(partial(lap_normalize, scale_n=lap_n))

    img = img0.copy()
    for octave in range(octave_n):
        if octave > 0:
            img = resize_ratio(img, octave_scale)
        for i in range(iter_n):
            g = calc_grad_tiled(img, t_grad)
            # 唯一的區別在於我們使用lap_norm_func來標準化g！
            g = lap_norm_func(g)
            img += g * step
            print('.', end=' ')
    savearray(img, 'lapnorm.jpg')

if __name__ == '__main__':
    name = 'mixed4d_3x3_bottleneck_pre_relu'
    channel = 139
    img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0
    layer_output = graph.get_tensor_by_name('import/%s:0' % name)
    render_lapnorm(layer_output[:, :, :, channel], img_noise, iter_n=20)

運行上面代碼後，將生成高質量的圖片：

生成最終的圖片
前面已經講解了如何通過極大化卷積層摸個通道的平均值生成圖片，並學習瞭如何生成更大和質量更高的圖像。但是最終的Deep Dream 模型還需要對圖片添加一個背景。具體代碼如下：

def resize(img, hw):
    min = img.min()
    max = img.max()
    img = (img - min) / (max - min) * 255
    img = np.float32(scipy.misc.imresize(img, hw))
    img = img / 255 * (max - min) + min
    return img


def render_deepdream(t_obj, img0, iter_n=10, step=1.5, octave_n=4, octave_scale=1.4):
    t_score = tf.reduce_mean(t_obj)
    t_grad = tf.gradients(t_score, t_input)[0]

    img = img0
    # 同樣將圖像進行金字塔分解
    # 提取高頻和低頻的方法比較簡單，直接縮放
    octaves = []
    for i in range(octave_n - 1):
        hw = img.shape[:2]
        lo = resize(img, np.int32(np.float32(hw) / octave_scale))
        hi = img - resize(lo, hw)
        img = lo
        octaves.append(hi)

    # 先生成低頻的圖像，再依次放大並加上高頻
    for octave in range(octave_n):
        if octave > 0:
            hi = octaves[-octave]
            img = resize(img, hi.shape[:2]) + hi
        for i in range(iter_n):
            g = calc_grad_tiled(img, t_grad)
            img += g * (step / (np.abs(g).mean() + 1e-7))
            print('.', end=' ')

    img = img.clip(0, 255)
    savearray(img, 'deepdream.jpg')


if __name__ == '__main__':
    img0 = PIL.Image.open('test.jpg')
    img0 = np.float32(img0)

    # name = 'mixed4d_3x3_bottleneck_pre_relu'
    name = 'mixed4c'
    # channel = 139
    layer_output = graph.get_tensor_by_name('import/%s:0' % name)
    # render_deepdream(layer_output[:, :, :, channel], img0, iter_n=150)
    render_deepdream(tf.square(layer_output), img0)

三、代碼下載地址

下載地址

TensorFlow學習筆記--Deep Dream模型

零、目標

一、技術原理

二、在TensorFlow中使用

三、代碼下載地址

覺得不錯打賞一下吧，金額隨意！

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

什麼是機器學習

統計學（一）

第二章--第四節：運算符（二）

第二章--第三節：運算符（一）

TensorFlow學習筆記--Deep Dream模型

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結