論文閱讀筆記：圖像分割方法deeplab以及Hole算法解析

好久沒搬磚了，剛好元旦放假，跑實驗的同時，滿足一下自己搬磚的慾望^_^。

尊重原創，轉載請註明：http://blog.csdn.net/tangwei2014

deeplab發表在ICLR 2015上。論文下載地址：Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFS.

deeplab方法概述
deeplab方法分爲兩步走，第一步仍然採用了FCN得到 coarse score map並插值到原圖像大小，然後第二步借用fully connected CRF對從FCN得到的分割結果進行細節上的refine。(有關FCN的內容介紹，可以參考我的前面得一篇博客：http://blog.csdn.net/tangwei2014/article/details/46882257)
下面這張圖很清楚地展示了整個結構：

然後這張圖展示了CRF處理前後的效果對比，可以看出用了CRF以後，細節確實改善了很多：
deeplab對FCN更加優雅的處理方式
在第一步中，deeplab仍然採用了FCN來得到score map,並且也是在VGG網絡上進行fine-tuning。但是在得到score map的處理方式上，要比原FCN處理的優雅很多。
還記得CVPR 2015的FCN中是怎麼得到一個更加dense的score map的嗎？是一張500x500的輸入圖像，直接在第一個卷積層上conv1_1來了一個100的大padding。最終在fc7層勉強得到一個16x16的score map。雖然處理上稍顯粗糙，但是畢竟人家是第一次將圖像分割在CNN上搞成end-to-end，並且在當時performance是state-of-the-art，也很理解。
deeplab摒棄了這種做法，取而代之的是對VGG的網絡結構上做了小改動：將VGG網絡的pool4和pool5層的stride由原來的2改爲了1。就是這樣一個改動，使得vgg網絡總的stride由原來的32變成8，進而使得在輸入圖像爲514x514，正常的padding時，fc7能得到67x67的score map, 要比FCN確實要dense很多很多。
但是這種改變網絡結果的做法也帶來了一個問題： stride改變以後，如果想繼續利用vgg model進行fine tuning，會導致後面filter作用的區域發生改變，換句話說就是感受野發生變化。這個問題在下圖(a) (b)中通過花括號體現出來了:
Hole算法
於是乎，作者想出了一招，來解決兩個看似有點矛盾的問題：
既想利用已經訓練好的模型進行fine-tuning，又想改變網絡結構得到更加dense的score map.
這個解決辦法就是採用Hole算法。如下圖(a) (b)所示，在以往的卷積或者pooling中，一個filter中相鄰的權重作用在feature map上的位置都是物理上連續的。如下圖(c)所示，爲了保證感受野不發生變化，某一層的stride由2變爲1以後，後面的層需要採用hole算法，具體來講就是將連續的連接關係是根據hole size大小變成skip連接的（圖(c)爲了顯示方便直接畫在本層上了）。不要被(c)中的padding爲2嚇着了，其實2個padding不會同時和一個filter相連。
pool4的stride由2變爲1，則緊接着的conv5_1, conv5_2和conv5_3中hole size爲2。接着pool5由2變爲1, 則後面的fc6中hole size爲4。
代碼

主要是im2col(前傳)和col2im(反傳)中做了改動 (增加了hole_w, hole_h)，這裏只貼cpu的用於理解：

//forward
template <typename Dtype>
void im2col_cpu(const Dtype* data_im, 
    const int num, const int channels, const int height, const int width,
    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
    const int stride_h, const int stride_w, const int hole_h, const int hole_w,
    Dtype* data_col) {
  // effective kernel if we expand the holes (trous)
  const int kernel_h_eff = kernel_h + (kernel_h - 1) * (hole_h - 1);
  const int kernel_w_eff = kernel_w + (kernel_w - 1) * (hole_w - 1);
  int height_col = (height + 2 * pad_h - kernel_h_eff) / stride_h + 1;
  int width_col = (width + 2 * pad_w - kernel_w_eff) / stride_w + 1;
  int channels_col = channels * kernel_h * kernel_w;
  for (int n = 0; n < num; ++n) {
    for (int c = 0; c < channels_col; ++c) {
      int w_offset = (c % kernel_w)  * hole_w;
      int h_offset = ((c / kernel_w) % kernel_h) * hole_h;
      int c_im = c / kernel_w / kernel_h;
      for (int h = 0; h < height_col; ++h) {
        const int h_im = h * stride_h + h_offset - pad_h;
        for (int w = 0; w < width_col; ++w) {
          const int w_im = w * stride_w + w_offset - pad_w;
          data_col[((n * channels_col + c) * height_col + h) * width_col + w] =
            (h_im >= 0 && h_im < height && w_im >= 0 && w_im < width) ?
            data_im[((n * channels + c_im) * height + h_im) * width + w_im] : 
            0.; // zero-pad
        } //width_col
      } //height_col
    } //channels_col
  } //num
}

//backward
template <typename Dtype>
void col2im_cpu(const Dtype* data_col,
    const int num, const int channels, const int height, const int width,
    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
    const int stride_h, const int stride_w, const int hole_h, const int hole_w,
    Dtype* data_im) {
  caffe_set(num * channels * height * width, Dtype(0), data_im);
  const int kernel_h_eff = kernel_h + (kernel_h - 1) * (hole_h - 1);
  const int kernel_w_eff = kernel_w + (kernel_w - 1) * (hole_w - 1);
  int height_col = (height + 2 * pad_h - kernel_h_eff) / stride_h + 1;
  int width_col = (width + 2 * pad_w - kernel_w_eff) / stride_w + 1;
  int channels_col = channels * kernel_h * kernel_w;
  for (int n = 0; n < num; ++n) {
    for (int c = 0; c < channels_col; ++c) {
      int w_offset = (c % kernel_w)  * hole_w;
      int h_offset = ((c / kernel_w) % kernel_h) * hole_h;
      int c_im = c / kernel_w / kernel_h;
      for (int h = 0; h < height_col; ++h) {
    const int h_im = h * stride_h + h_offset - pad_h;
        for (int w = 0; w < width_col; ++w) {
          const int w_im = w * stride_w + w_offset - pad_w;
          if (h_im >= 0 && h_im < height && w_im >= 0 && w_im < width) {
            data_im[((n * channels + c_im) * height + h_im) * width + w_im] += 
              data_col[((n * channels_col + c) * height_col + h) * width_col + w];
          }
        }
      }
    }
  }
}

論文閱讀筆記：圖像分割方法deeplab以及Hole算法解析

SQL優化-20231016

隨機抽樣一致(Random Sample Consensus, RANSAC)

論文閱讀筆記：(YOLO 看一次就夠了) You Only Look Once: Unified, Real-Time Object Detection

caffe中增加自己的layer

STL學習筆記-入門概念

Fast RCNN ubuntu下安裝筆記

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結