Matlab自帶神經網絡工具箱在2017b版本中才出現,而在2014年9月,VLFeat就推出了深度神經網絡工具箱MatConvNet。該工具箱功能全面,演示程序多,不但給出了深度網絡各層(包括卷積層、池化層、激活層、Softmax層)和設置損失、dropout、歸一化的實現函數,通過這些函數的組合實現自己的深度網絡,還給出了簡單卷積網絡和DAG網絡的封裝,詳細內容可參看幫助文檔。同時,官網還提供了預訓練模型,並給出了它們在物體檢測、人臉識別、語義分割和ImageNet ILSVRC分類的應用,其中ImageNet ILSVRC分類使用的模型包括ResNet、GoogleNet、VGG-VD、VGG-S/M/F、Caffe reference model和Alexnet,對學習和研究人員有很大參考價值。
除該工具箱給的演示程序外,牛津大學VGG也有相關演示程序,可進入Oxford Vision Geometry Group頁面中的 practical,進入該頁面後請下載大小爲約800兆的文件,該文件中包含了編譯過的腳本文件和數據,演示程序可直接運行。除了該演示外,Oxford VGG還提供了許多與圖像檢索、識別的項目,詳細內容請查看software和projects,這些內容是該項目組多年成果,對相關領域研究有很大幫助。
下面簡單介紹一個來自practical-CNN的示例exercise1.m。
%exercise1.m
setup ;
x = imread('peppers.png') ;
% 將圖像數據轉換爲單精度
x = im2single(x) ;
% 顯示輸入圖像
figure(1) ; clf ; imagesc(x) ;
% 初始化卷基層參數
w = randn(5,5,3,10,'single') ;
%對輸入圖像進行卷積運算
y = vl_nnconv(x, w, []) ;
figure(2) ; clf ; vl_imarraysc(y) ; colormap gray ;
% 對輸入圖像進行步長爲16的卷積,達到下采樣效果
y_ds = vl_nnconv(x, w, [], 'stride', 16) ;
figure(3) ; clf ; vl_imarraysc(y_ds) ; colormap gray ;
%對輸入圖像進行有填充的卷積
y_pad = vl_nnconv(x, w, [], 'pad', 4) ;
figure(4) ; clf ; vl_imarraysc(y_pad) ; colormap gray ;
% 手動設置卷積核係數
w = [0 1 0 ;
1 -4 1 ;
0 1 0 ] ;
w = single(repmat(w, [1, 1, 3])) ;
y_lap = vl_nnconv(x, w, []) ;
figure(5) ; clf ; colormap gray ;
subplot(1,2,1) ; imagesc(y_lap) ; title('filter output') ;
subplot(1,2,2) ; imagesc(-abs(y_lap)) ; title('- abs(filter output)') ;
w = single(repmat([1 0 -1], [1, 1, 3])) ;
w = cat(4, w, -w) ;
y = vl_nnconv(x, w, []) ;
%對卷積輸出進行激活運算
z = vl_nnrelu(y) ;
figure(6) ; clf ; colormap gray ;
subplot(1,2,1) ; vl_imarraysc(y) ;
subplot(1,2,2) ; vl_imarraysc(z) ;
%對卷積輸出進行池化運算
y = vl_nnpool(x, 15) ;
figure(7) ; clf ; imagesc(y) ;
rho = 5 ;
kappa = 0 ;
alpha = 1 ;
beta = 0.5 ;
%對卷積輸出進行歸一化運算
y_nrm = vl_nnnormalize(x, [rho kappa alpha beta]) ;
figure(8) ; clf ; imagesc(y_nrm) ;
下面是來自practical-category-recognition-cnn-2017a的示例exercise2.m。
%exercise2.m
setup ;
%加載預訓練模型,需要先下載該模型文件
encoding = 'vggm128-conv4' ;
% 是否使用圖像增強,即增加圖像數量
augmentation = false ;
% 計算圖像正例特徵
encoder = loadEncoder(encoding) ;
pos.names = getImageSet('data/myImages', augmentation) ;
if numel(pos.names) == 0, error('Please add some images to data/myImages before running this exercise') ; end
pos.descriptors = encodeImage(encoder, pos.names, ['data/cache_' encoding]) ;
% 加入噪聲圖像
neg = load(sprintf('data/background_train_%s.mat',encoding)) ;
names = {pos.names{:}, neg.names{:}};
descriptors = [pos.descriptors, neg.descriptors] ;
labels = [ones(1,numel(pos.names)), - ones(1,numel(neg.names))] ;
clear pos neg ;
%加載測試圖像
pos = load(sprintf('data/horse_val_%s.mat',encoding)) ;
neg = load(sprintf('data/background_val_%s.mat',encoding)) ;
testNames = {pos.names{:}, neg.names{:}};
testDescriptors = [pos.descriptors, neg.descriptors] ;
testLabels = [ones(1,numel(pos.names)), - ones(1,numel(neg.names))] ;
clear pos neg ;
fprintf('Number of training images: %d positive, %d negative\n', ...
sum(labels > 0), sum(labels < 0)) ;
fprintf('Number of testing images: %d positive, %d negative\n', ...
sum(testLabels > 0), sum(testLabels < 0)) ;
% 對特徵進行L2歸一化
descriptors = bsxfun(@times, descriptors, 1./sqrt(sum(descriptors.^2,1))) ;
testDescriptors = bsxfun(@times, testDescriptors, 1./sqrt(sum(testDescriptors.^2,1))) ;
%訓練線性SVM模型
C = 10 ;
[w, bias] = trainLinearSVM(descriptors, labels, C) ;
% 計算訓練數據得分
scores = w' * descriptors + bias ;
% 計算測試數據得分
testScores = w' * testDescriptors + bias ;
figure(3) ; clf ; set(3,'name','Ranked test images (subset)') ;
displayRankedImageList(testNames, testScores) ;
% 顯示precision-recall曲線
figure(4) ; clf ; set(4,'name','Precision-recall on test data') ;
vl_pr(testLabels, testScores) ;
% 顯示平均準確率
[drop,drop,info] = vl_pr(testLabels, testScores) ;
fprintf('Test AP: %.2f\n', info.auc) ;
[drop,perm] = sort(testScores,'descend') ;
fprintf('Correctly retrieved in the top 36: %d\n', sum(testLabels(perm(1:36)) > 0)) ;
工具箱的主要函數有:
Building blocks
- vl_nnbnorm Batch normalization.
- vl_nnbilinearsampler Bilinear Sampler.
- vl_nnconv Linear convolution by a filter.
- vl_nnconcat Concatenation.
- vl_nnconvt Convolution transpose.
- vl_nncrop Cropping.
- vl_nndropout Dropout.
- vl_nnloss Classification log-loss.
- vl_nnnoffset Norm-dependent offset.
- vl_nnnormalize Local Response Normalization (LRN).
- vl_nnpdist Pairwise distances.
- vl_nnpool Max and sum pooling.
- vl_nnrelu Rectified Linear Unit.
- vl_nnroipool Region of interest pooling.
- vl_nnsigmoid Sigmoid.
- vl_nnsoftmax Channel soft-max.
- vl_nnsoftmaxloss Deprecated
- vl_nnspnorm Spatial normalization.
SimpleCNN wrapper- vl_simplenn A lightweight wrapper for CNNs with a linear topology.
- vl_simplenn_tidy Upgrade or otherwise fix a CNN.
- vi_simplenn_display Print information about the CNN architecture.
- vl_simplenn_move Move the CNN between CPU and GPU.
DagNN wrapper- DagNN An object-oriented wrapper for CNN with complex topologies
Other functions- vl_argparse A helper function to parse optional arguments.
- vl_compilenn Compile the MEX fields in the toolbox.
- vl_contrib Download, compile, and setup third-party modules.
- vl_rootnn Return the path to the MatConvNet toolbox installation.
- vl_setpunn Setup MatConvNet for use in MATLAB.
- vl_imreadjpeg Quickly load a batch of JPEG images.
- vl_taccum Accumulate tensors operating in-place when possible.
- vl_tmove Exchange tensors between MATLAB processes and GPUs.
- vl_tshow Show a tensor on screen.