Bounding box regression RCNN系列網絡中矩形框的計算

原創

2020-06-16 06:38

0. bounding-box regression

bouding-box regression 在R-CNN論文附錄C中有詳細的介紹，在後續的論文Fast-RCNN、Faster-RCNN、Mask-RCNN、SSD系列、yolo系列中都沒有仔細介紹.
本文使用RCNN論文來介紹bounding box regression原理，同時利用faster rcnn代碼來分析理論公式在代碼中是如何實現的

R-CNN 論文地址：      https://arxiv.org/pdf/1311.2524v3.pdf
faster r-cnn地址 :    https://github.com/ShaoqingRen/faster_rcnn

1. bouding-box參數解釋

bounding-box是指RPN網絡中與分類cls層並列的bbox層

bbox層的輸出值是 proposal（文中稱之爲P）到 Ground Truth座標值的四個變化係數，分別表示平移以及縮放的參數。(這裏的proposal在rpn網絡中是指預先設計的anchor，在fast rcnn網絡中是指初步得到的proposal)
bbox層網絡權重值則描述了輸入圖片與平移縮放變化係數之間的關係。
訓練過程學習什麼參數

學習的參數是bbox層的網絡權重，因爲bbox層有四個通道，分別對應四個輸出值，可以將每個通道對應的卷積參數稱之爲 $w_x,w_y,w_w,w_h$ ，圖像經過這層卷積之後就是四個值 $d_x(P), d_y(P), d_w(P), d_h(P)$ 了，爲了方便表示，將這四個數字或者說函數的結果表示爲 $t_x',t_y',t_w',t_h'$ ,也就是RPN網絡中bbox層的輸出，也就是proposal如何通過平移以及縮放成爲更準確的結果。

2. 網絡訓練過程

理論公式

一方面將bbox層的輸出 $t_x',t_y',t_w',t_h'$ 作爲預測值，另一方面將 $t_x=(G_x - P_x)/ P_w$ $t_x=(G_x - P_x)/ P_w$ $t_x=(G_x - P_x)/ P_w$ $t_x=(G_x - P_x)/ P_w$
作爲label，於是求使label與預測值最小的網絡權重偏移參數 $w_x,w_y,w_w,w_h$ …這便是bbox層網絡權重的更新過程。

其中 $G$ 是實際值，那麼 $P$ 要怎麼求解出來呢？

R-CNN的預測框是由 selective search方法得到的，稱之爲 proposal.於是這個proposal的x,y,w,h就用於和ground truth作比較。
RPN網絡中 P是 9個anchor中的被保留下來的那個anchor， anchor經過上面的公式得到第一次優化的bounding-box，稱爲proposal。
Fast RCNN中將RPN的輸出proposal作爲P，再次尋求P到G之間的變換函數。
實際代碼

function [regression_label] = fast_rcnn_bbox_transform(ex_boxes, gt_boxes)
% [regression_label] = fast_rcnn_bbox_transform(ex_boxes, gt_boxes)
% --------------------------------------------------------
% Fast R-CNN
% Reimplementation based on Python Fast R-CNN (https://github.com/rbgirshick/fast-rcnn)
% Copyright (c) 2015, Shaoqing Ren
% Licensed under The MIT License [see LICENSE for details]
% --------------------------------------------------------

    ex_widths = ex_boxes(:, 3) - ex_boxes(:, 1) + 1;
    ex_heights = ex_boxes(:, 4) - ex_boxes(:, 2) + 1;
    ex_ctr_x = ex_boxes(:, 1) + 0.5 * (ex_widths - 1);
    ex_ctr_y = ex_boxes(:, 2) + 0.5 * (ex_heights - 1);
    
    gt_widths = gt_boxes(:, 3) - gt_boxes(:, 1) + 1;
    gt_heights = gt_boxes(:, 4) - gt_boxes(:, 2) + 1;
    gt_ctr_x = gt_boxes(:, 1) + 0.5 * (gt_widths - 1);
    gt_ctr_y = gt_boxes(:, 2) + 0.5 * (gt_heights - 1);
    
    targets_dx = (gt_ctr_x - ex_ctr_x) ./ (ex_widths+eps);
    targets_dy = (gt_ctr_y - ex_ctr_y) ./ (ex_heights+eps);
    targets_dw = log(gt_widths ./ ex_widths);
    targets_dh = log(gt_heights ./ ex_heights);
    
    regression_label = [targets_dx, targets_dy, targets_dw, targets_dh];
end

遷移到RPN網絡中的做法

其中上面的代碼中ex_boxes即爲faster rcnn論文中說到的篩選方法之後被選中的9個anchor中的一個，一個anchor有四個參數
在Fast RCNN的訓練過程中，也就是Faster RCNN第二個bounding-box regression過程中，RPN網絡產生的anchor經過RPN層後得到第一次優化的bounding-box，稱爲proposal，因爲有NMS步驟，所以對於一個物體，最多有一個proposal框，拿這個proposal的四個參數再次和ground truth來運算，形成了Fast RCNN層的 $t_x,t_y,t_w,t_z$ 。於是就將proposal按照 $t_x,t_y,t_w,t_z$ 去調整爲最終的輸出。

在RPN網絡訓練過程中，anchor的四個數字認爲是公式中的P。
在Fast-RCNN網絡訓練部分，P不再是anchor，而是由RPN網絡得到的proposal框的四個值。

anchor生成過程可以參看這篇博客

3. 預測過程

理論公式

$\hat{G}_x= P_wd_x(P) + P_x$ $\hat{G}_y = P_hd_y(P) + P_y$ $\hat{G}_w = P_wexp(d_w(P))$ $\hat{G}_h = P_hexp(d_h(P))$
代碼框架

for j = 1:2 % we warm up 2 times
   im = uint8(ones(375, 500, 3)*128);
   if opts.use_gpu
       im = gpuArray(im);
   end
   % proposal_im_detect是RPN網絡輸出結果的過程
   [boxes, scores]             = proposal_im_detect(proposal_detection_model.conf_proposal, rpn_net, im);
   % aboxes是經過NMS等過程後，挑選出合適的boxes
   aboxes                      = boxes_filter([boxes, scores], opts.per_nms_topN, opts.nms_overlap_thres, opts.after_nms_topN, opts.use_gpu);
   if proposal_detection_model.is_share_feature  
       %用於RPN層的卷積和Fast RCNN的卷積層共享參數, 要達到這個功能，需要按照論文那樣四步走訓練網絡
       [boxes, scores]             = fast_rcnn_conv_feat_detect(proposal_detection_model.conf_detection, 
       								 fast_rcnn_net, im, 
           							 rpn_net.blobs(proposal_detection_model.last_shared_output_blob_name), 
           							 aboxes(:, 1:4), opts.after_nms_topN);
   else
       [boxes, scores]             = fast_rcnn_im_detect(proposal_detection_model.conf_detection, 
                                          fast_rcnn_net, im, aboxes(:, 1:4), opts.after_nms_topN);
   end
end

公式在代碼中的應用

    % 在RPN網絡中使用anchor來預測第一次的boxes
	box_deltas = output_blobs{1};    % 從rpn層的輸出
	%獲取到的anchors，經過NMS等操作處理
	anchors = proposal_locate_anchors(conf, size(im), conf.test_scales, featuremap_size);   
	% 利用anchor和 box_deltas求取預測框輸出的過程 ，也是下面論文中的公式
	pred_boxes = fast_rcnn_bbox_transform_inv(anchors, box_deltas);

	%Faster RCNN中第二次bounding-box regression即Fast RCNN中的迴歸過程
	box_deltas = output_blobs{1};
	box_deltas = squeeze(box_deltas)';
	% 這裏使用的是上一步產生的boxes
	pred_boxes = fast_rcnn_bbox_transform_inv(boxes, box_deltas);

4. R-CNN論文Bounding-box regression內容

另外不得不感嘆R-CNN的附錄圖片真的超級漂亮！檢測效果、美觀程度兼備！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Bounding box regression RCNN系列網絡中矩形框的計算

0. bounding-box regression

1. bouding-box參數解釋

bounding-box是指RPN網絡中與分類cls層並列的bbox層

bbox層的輸出值是 proposal（文中稱之爲P）到 Ground Truth座標值的四個變化係數，分別表示平移以及縮放的參數。(這裏的proposal在rpn網絡中是指預先設計的anchor，在fast rcnn網絡中是指初步得到的proposal)

bbox層網絡權重值則描述了輸入圖片與平移縮放變化係數之間的關係。

訓練過程學習什麼參數

2. 網絡訓練過程

理論公式

實際代碼

遷移到RPN網絡中的做法

3. 預測過程

理論公式

代碼框架

公式在代碼中的應用

4. R-CNN論文Bounding-box regression內容

另外不得不感嘆R-CNN的附錄圖片真的超級漂亮！檢測效果、美觀程度兼備！

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

flask創建的服務器無法被局域網內其他計算機訪問

眼球中心定位跟蹤算法—eyelike 代碼解析

學習opengl之爲立方體六個面貼上不同的紋理圖片

c++線程池 progschj/ThreadPool解析

Karto SLAM之open_karto代碼學習筆記（二）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結