要想創建一個Caffe模型，需要在prototxt中定義一個model architecture（模型架構）。
Caffe自帶的Layer及其參數被定義在caffe.proto中。

Vision Layers

頭文件： ./include/caffe/vision_layers.hpp

Vision layers 通常以圖片images作爲輸入，運算後產生輸出的也是圖片images。對於圖片而言，可能是單通道的(c=1)，例如灰度圖，或者三通道的 (c=3)，例如RGB圖。但是，對於Vision layers而言，最重要的特性是輸入的spatial structure（空間結構）。2D的幾何形狀有助於輸入處理，大部分的Vision layers工作是對於輸入圖片中的某一個區域做一個特定的處理，產生一個相應的輸出。與此相反，其他大部分的layers會忽略輸入的空間結構，而只是將輸入視爲一個很大的向量，維度爲： c*h*w。

Convolution

類型（type）：Convolution（卷積層）
CPU 實現： ./src/caffe/layers/convolution_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/convolution_layer.cu
參數（convolution_param）：
必要：
- num_output (c_o): the number of filters（濾波器數目）
- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter（每一個濾波器的大小）
強烈推薦：
- weight_filler [default type: ‘constant’ value: 0]（濾波器權重，默認爲0）
可選：
- bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs（是否添加bias-偏置項，默認爲True）
- pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input（爲輸入添加邊界的像素大小，默認爲0）
- stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input（每一次使用濾波器處理輸入圖片時，前後兩次處理區域的間隔，即“步進”，默認爲1）
- group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.（默認爲1，如果大於1：將限制每一個濾波器只與輸入的一部分連接。輸入、輸出通道會被分隔爲不同的g個groups，並且第i個輸出group只會與第i個輸出group相關）
輸入（Input）
n * c_i * h_i * w_i
輸出（Output）
n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1；w_o類似
例子(詳見 ./examples/imagenet/imagenet_train_val.prototxt)

layer {
  name: "conv1"                  # 名稱：conv1
  type: "Convolution"            # 類型：卷積層
  bottom: "data"                 # 輸入層：數據層
  top: "conv1"                   # 輸出層：卷積層1
  # 濾波器（filters）的學習速率因子和衰減因子
  param { lr_mult: 1 decay_mult: 1 }
  # 偏置項（biases）的學習速率因子和衰減因子
  param { lr_mult: 2 decay_mult: 0 }
  convolution_param {
    num_output: 96               # 96個濾波器（filters）
    kernel_size: 11              # 每個濾波器（filters）大小爲11*11
    stride: 4                    # 每次濾波間隔爲4個像素
    weight_filler {
      type: "gaussian"           # 初始化高斯濾波器（Gaussian）
      std: 0.01                  # 標準差爲0.01， 均值默認爲0
    }
    bias_filler {
      type: "constant"           # 初始化偏置項（bias）爲零
      value: 0
    }
  }
}

卷積層（The Convolution layer）利用一系列具有學習功能的濾波器（learnable filters）對輸入的圖像進行卷積操作，每一個濾波器（filter）對於一個特徵（feature ）會產生一個輸出圖像（output image）。

Pooling

類型（type）：Pooling（池化層）
CPU 實現： ./src/caffe/layers/pooling_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/pooling_layer.cu
參數（pooling_param）：
- 必要：
  - kernel_size (or kernel_h and kernel_w): specifies height and width of each filter（每一個濾波器的大小）
- 可選：
  - pool [default MAX]: the pooling method. Currently MAX, AVE, or STOCHASTIC（pooling方法，目前有MAX、AVE,和STOCHASTIC三種，默認爲MAX）
  - pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input（爲輸入添加邊界的像素大小，默認爲0）
  - stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input（每一次使用濾波器處理輸入圖片時，前後兩次處理區域的間隔，即“步進”，默認爲1）
輸入（Input）
- n * c_i * h_i * w_i
輸出（Output）
- n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1；w_o類似
例子(詳見 ./examples/imagenet/imagenet_train_val.prototxt)

layer {
  name: "pool1"                 # 名稱：pool1
  type: "Pooling"               # 類型：池化層
  bottom: "conv1"               # 輸入層：卷積層conv1
  top: "pool1"                  # 輸出層：池化層pool1
  pooling_param {
    pool: MAX                   # pool方法：MAX
    kernel_size: 3              # 每次pool區域爲3*3像素大小
    stride: 2                   # pool步進爲2
  }
}

Local Response Normalization (LRN)

類型（type）：LRN（局部響應歸一化層）
CPU 實現： ./src/caffe/layers/lrn_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/lrn_layer.cu
參數（lrn_param）：
- 可選：
  - local_size [default 5]: the number of channels to sum over (for cross channel LRN) or the side length of the square region to sum over (for within channel LRN)（對於cross channel LRN，表示需要求和的channel的數量；對於within channel LRN表示需要求和的空間區域的邊長；默認爲5）
  - alpha [default 1]: the scaling parameter（縮放參數，默認爲1）
  - beta [default 5]: the exponent（指數，默認爲5）
  - norm_region [default ACROSS_CHANNELS]: whether to sum over adjacent channels (ACROSS_CHANNELS) or nearby spatial locaitons (WITHIN_CHANNEL)（選擇基準區域，是ACROSS_CHANNELS => 相鄰channels，還是WITHIN_CHANNEL => 同一 channel下的相鄰空間區域；默認爲ACROSS_CHANNELS）

LRN Layer對一個局部的輸入區域進行歸一化，有兩種模式。ACROSS_CHANNELS模式，局部區域在相鄰的channels之間拓展，不進行空間拓展，所以維度是local_size x 1 x 1。WITHIN_CHANNEL模式，局部區域進行空間拓展，但是是在不同的channels中，所以維度是1 x local_size x local_size。對於每一個輸入，都要除以：，其中n是局部區域的大小，求和部分是對該輸入值爲中心的區域進行求和（必要時候可以補零）。

im2col

Im2col 是一個helper方法，用於將圖片文件image轉化爲列矩陣，詳細的細節不需要過多的瞭解。在Caffe中進行卷積操作，做矩陣乘法時，會用到Im2col方法。

Loss Layers

Caffe是通過最小化輸出output與目標target之間的cost（loss）來驅動學習的。loss是由forward pass計算得出的，loss的gradient 是由backward pass計算得出的。

Softmax

類型（type）：SoftmaxWithLoss（廣義線性迴歸分析損失層）

Softmax Loss Layer計算的是輸入的多項式迴歸損失（multinomial logistic loss of the softmax of its inputs）。可以當作是將一個softmax layer和一個multinomial logistic loss layer連接起來，但是計算出的gradient更可靠。

Sum-of-Squares / Euclidean

類型（type）：EuclideanLoss（歐式損失層）

Euclidean loss layer計算兩個不同輸入之間的平方差之和，

Hinge / Margin

類型（type）：HingeLoss
CPU 實現： ./src/caffe/layers/hinge_loss_layer.cpp
CUDA、GPU實現：尚無
參數（hinge_loss_param）：
- 可選：
  - norm [default L1]: the norm used. Currently L1, L2（可以選擇使用L1範數或者L2範數；默認爲L1）
輸入（Input）
- n * c * h * w Predictions（預測值）
- n * 1 * 1 * 1 Labels（標籤值）
輸出（Output）
- 1 * 1 * 1 * 1 Computed Loss（計算得出的loss值）
例子

# 使用L1範數
layer {
  name: "loss"                  # 名稱：loss
  type: "HingeLoss"             # 類型：HingeLoss
  bottom: "pred"                # 輸入：預測值
  bottom: "label"               # 輸入：標籤值
}

# 使用L2範數
layer {
  name: "loss"                  # 名稱：loss
  type: "HingeLoss"             # 類型：HingeLoss
  bottom: "pred"                # 輸入：預測值
  bottom: "label"               # 輸入：標籤值
  top: "loss"                   # 輸出：loss值
  hinge_loss_param {
    norm: L2                    # 使用L2範數
  }
}

關於範數：

Sigmoid Cross-Entropy

類型（type）：SigmoidCrossEntropyLoss
（沒有詳解）

Infogain

類型（type）：InfogainLoss
（沒有詳解）

Accuracy and Top-k

類型（type）：Accuracy
計算輸出的準確率（相對於target），事實上這不是一個loss layer，並且也沒有backward pass。

Activation / Neuron Layers

激勵層的操作都是element-wise的操作（針對每一個輸入blob產生一個相同大小的輸出）：

輸入（Input）
- n * c * h * w
輸出（Output）
- n * c * h * w

ReLU / Rectified-Linear and Leaky-ReLU

類型（type）：ReLU
CPU 實現： ./src/caffe/layers/relu_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/relu_layer.cu
參數（relu_param）：
- 可選：
  - negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.（但當輸入x小於0時，指定輸出爲negative_slope * x；默認值爲0）
例子(詳見 ./examples/imagenet/imagenet_train_val.prototxt)

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

給定一個輸入值x，ReLU layer的輸出爲：x > 0 ? x : negative_slope * x，如未給定參數negative_slope 的值，則爲標準ReLU方法：max(x, 0)。ReLU layer支持in-place計算，輸出會覆蓋輸入，以節省內存空間。

Sigmoid

類型（type）：Sigmoid
CPU 實現： ./src/caffe/layers/sigmoid_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/sigmoid_layer.cu
例子(詳見 ./examples/mnist/mnist_autoencoder.prototxt)

layer {
  name: "encode1neuron"
  bottom: "encode1"
  top: "encode1neuron"
  type: "Sigmoid"
}

對於每一個輸入值x，Sigmoid layer的輸出爲sigmoid(x)。

TanH / Hyperbolic Tangent

類型（type）：TanH
CPU 實現： ./src/caffe/layers/tanh_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/tanh_layer.cu
例子

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "TanH"
}

對於每一個輸入值x，TanH layer的輸出爲tanh(x)。

Absolute Value

類型（type）：AbsVal
CPU 實現： ./src/caffe/layers/absval_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/absval_layer.cu
例子

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "AbsVal"
}

對於每一個輸入值x，AbsVal layer的輸出爲abs(x)。

Power

類型（type）：Power
CPU 實現： ./src/caffe/layers/power_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/power_layer.cu
參數（power_param）：
- 可選：
  - power [default 1]（指數，默認爲1）
  - scale [default 1]（比例，默認爲1）
  - shift [default 0]（偏移，默認爲0）
例子

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "Power"
  power_param {
    power: 1
    scale: 1
    shift: 0
  }
}

對於每一個輸入值x，Power layer的輸出爲(shift + scale * x) ^ power。

BNLL

類型（type）：BNLL（二項正態對數似然，binomial normal log likelihood）
CPU 實現： ./src/caffe/layers/bnll_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/bnll_layer.cu
例子

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: BNLL
}

對於每一個輸入值x，BNLL layer的輸出爲log(1 + exp(x))。

Data Layers

Data 通過Data Layers進入Caffe，Data Layers位於Net的底部。
Data 可以來自：1、高效的數據庫（LevelDB 或 LMDB）；2、內存；3、HDF5或image文件（效率低）。
基本的輸入預處理（例如：減去均值，縮放，隨機裁剪，鏡像處理）可以通過指定TransformationParameter達到。

Database

類型（type）：Data（數據庫）
參數：
- 必要：
  - source: the name of the directory containing the database（數據庫名稱）
  - batch_size: the number of inputs to process at one time（每次處理的輸入的數據量）
- 可選：
  - rand_skip: skip up to this number of inputs at the beginning; useful for asynchronous sgd（在開始的時候跳過這個數值量的輸入；這對於異步隨機梯度下降是非常有用的）
  - backend [default LEVELDB]: choose whether to use a LEVELDB or LMDB（選擇使用LEVELDB 數據庫還是LMDB數據庫，默認爲LEVELDB）

In-Memory

類型（type）：MemoryData
參數：
- 必要：
  - batch_size, channels, height, width: specify the size of input chunks to read from memory（4個值，確定每次讀取輸入數據量的大小）

Memory Data Layer從內存直接讀取數據（而不是複製數據）。使用Memory Data Layer之前，必須先調用，MemoryDataLayer::Reset（C++方法）或Net.set_input_arrays（Python方法）以指定一個source來讀取一個連續的數據塊（4D，按行排列），每次讀取大小由batch_size決定。

HDF5 Input

類型（type）：HDF5Data
參數：
- 必要：
  - source: the name of the file to read from（讀取的文件的名稱）
  - batch_size（每次處理的輸入的數據量）

HDF5 Output

類型（type）：HDF5Output
參數：
- 必要：
  - file_name: name of file to write to（寫入的文件的名稱）
HDF5 output layer與這部分的其他layer的功能正好相反，不是讀取而是寫入。

Images

類型（type）：ImageData
參數：
- 必要：
  - source: name of a text file, with each line giving an image filename and label（一個text文件的名稱，每一行指定一個image文件名和label）
  - batch_size: number of images to batch together（每次處理的image的數據）
- 可選：
  - rand_skip: （在開始的時候跳過這個數值量的輸入）
  - shuffle [default false]（是否隨機亂序，默認爲否）
    -new_height, new_width: if provided, resize all images to this size（縮放所有的image到新的大小）

Windows

類型（type）：WindowData
（沒有詳解）

Dummy

類型（type）：DummyData

DummyData 用於開發和測試，詳見DummyDataParameter（沒有給出鏈接）。

Common Layers

Inner Product

類型（type）：Inner Product（全連接層）
CPU 實現： ./src/caffe/layers/inner_product_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/inner_product_layer.cu
參數（inner_product_param）：
- 必要：
  - num_output (c_o): the number of filters（濾波器數目）
- 強烈推薦：
  - weight_filler [default type: ‘constant’ value: 0]（濾波器權重；默認類型爲constant，默認值爲0）
- 可選：
  - bias_filler [default type: ‘constant’ value: 0]（bias-偏置項的值，默認類型爲constant，默認值爲0）
  - bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs（是否添加bias-偏置項，默認爲True）
輸入（Input）
- n * c_i * h_i * w_i
輸出（Output）
- n * c_o * 1 * 1
例子

layer {
  name: "fc8"                              # 名稱：fc8
  type: "InnerProduct"                     # 類型：全連接層
  # 權重（weights）的學習速率因子和衰減因子
  param { lr_mult: 1 decay_mult: 1 }
  # 偏置項（biases）的學習速率因子和衰減因子
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 1000                       # 1000個濾波器（filters）
    weight_filler {
      type: "gaussian"                     # 初始化高斯濾波器（Gaussian）
      std: 0.01                            # 標準差爲0.01， 均值默認爲0
    }
    bias_filler {
      type: "constant"                     # 初始化偏置項（bias）爲零
      value: 0
    }
  }
  bottom: "fc7"                            # 輸入層：fc7
  top: "fc8"                               # 輸出層：fc8
}

InnerProduct layer（常被稱爲全連接層）將輸入視爲一個vector，輸出也是一個vector（height和width被設爲1）

Splitting

類型（type）：Split

Split layer用於將一個輸入的blob分離成多個輸出的blob。這用於當需要將一個blob輸入至多個輸出layer時。

Flattening

類型（type）：Flatten

Flatten layer用於把一個維度爲n * c * h * w的輸入轉化爲一個維度爲 n * (c*h*w)的向量輸出。

Reshape

類型（type）：Reshape
CPU 實現： ./src/caffe/layers/reshape_layer.cpp
CUDA、GPU實現：尚無
參數（reshape_param）：
- 可選：
  - shape（改變後的維度，詳見下面解釋）
輸入（Input）
- a single blob with arbitrary dimensions（一個任意維度的blob）
輸出（Output）
- the same blob, with modified dimensions, as specified by reshape_param（相同內容的blob，但維度根據reshape_param改變）
例子

 layer {
    name: "reshape"                       # 名稱：reshape
    type: "Reshape"                       # 類型：Reshape
    bottom: "input"                       # 輸入層名稱：input
    top: "output"                         # 輸出層名稱：output
    reshape_param {
      shape {
        dim: 0  # 這個維度與輸入相同
        dim: 2
        dim: 3
        dim: -1 # 根據其他維度自動推測
      }
    }
  }

Reshape layer只改變輸入數據的維度，但內容不變，也沒有數據複製的過程，與Flatten layer類似。

輸出維度由reshape_param 指定，正整數直接指定維度大小，下面兩個特殊的值：

0 => 表示copy the respective dimension of the bottom layer，複製輸入相應維度的值。
-1 => 表示infer this from the other dimensions，根據其他維度自動推測維度大小。reshape_param中至多只能有一個-1。

再舉一個例子：如果指定reshape_param參數爲：{ shape { dim: 0 dim: -1 } } ，那麼輸出和Flattening layer的輸出是完全一樣的。

Concatenation

類型（type）：Concat（連結層）
CPU 實現： ./src/caffe/layers/concat_layer.cpp
CUDA、GPU實現： ./src/caffe/layers/concat_layer.cu
參數（concat_param）：
- 可選：
  - axis [default 1]: 0 for concatenation along num and 1 for channels.（0代表連結num，1代表連結channel）
輸入（Input）
-n_i * c_i * h * w for each input blob i from 1 to K.（第i個blob的維度是n_i * c_i * h * w，共K個）
輸出（Output）
- if axis = 0: (n_1 + n_2 + … + n_K) * c_1 * h * w, and all input c_i should be the same.（axis = 0時，輸出 blob的維度爲(n_1 + n_2 + … + n_K) * c_1 * h * w，要求所有的input的channel相同）
- if axis = 1: n_1 * (c_1 + c_2 + … + c_K) * h * w, and all input n_i should be the same.（axis = 0時，輸出 blob的維度爲n_1 * (c_1 + c_2 + … + c_K) * h * w，要求所有的input的num相同）
例子

layer {
  name: "concat"
  bottom: "in1"
  bottom: "in2"
  top: "out"
  type: "Concat"
  concat_param {
    axis: 1
  }
}

Concat layer用於把多個輸入blob連結成一個輸出blob。

Slicing

Slice layer用於將一個input layer分割成多個output layers，根據給定的維度（目前只能指定num或者channel）。

類型（type）：Slice
例子

layer {
  name: "slicer_label"
  type: "Slice"
  bottom: "label"
  ## 假設label的維度是：N x 3 x 1 x 1
  top: "label1"
  top: "label2"
  top: "label3"
  slice_param {
    axis: 1                        # 指定維度爲channel
    slice_point: 1                 # 將label[~][1][~][~]賦給label1
    slice_point: 2                 # 將label[~][2][~][~]賦給label2
                                   # 將label[~][3][~][~]賦給label3
  }
}

axis表明是哪一個維度，slice_point是該維度的索引，slice_point的數量必須是top blobs的數量減1.

Elementwise Operations

類型（type）： Eltwise
（沒有詳解）

Argmax

類型（type）：ArgMax
（沒有詳解）

Softmax

類型（type）：Softmax
（沒有詳解）

Mean-Variance Normalization

類型（type）：MVN
（沒有詳解）

caffe中各層定義

Vision Layers

Convolution

Pooling

Local Response Normalization (LRN)

im2col

Loss Layers

Softmax

Sum-of-Squares / Euclidean

Hinge / Margin

Sigmoid Cross-Entropy

Infogain

Accuracy and Top-k

Activation / Neuron Layers

ReLU / Rectified-Linear and Leaky-ReLU

Sigmoid

TanH / Hyperbolic Tangent

Absolute Value

Power

BNLL

Data Layers

Database

In-Memory

HDF5 Input

HDF5 Output

Images

Windows

Dummy

Common Layers

Inner Product

Splitting

Flattening

Reshape

Concatenation

Slicing

Elementwise Operations

Argmax

Softmax

Mean-Variance Normalization