近期在研究FCOS目標檢測算法,論文發表於ICCV 2019。FCOS方法性能還是很不錯的,代碼工程化也很好,準備follow一下。
FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
論文:https://arxiv.org/pdf/1904.01355.pdf
源碼:https://github.com/tianzhi0549/FCOS
------------------------------------ Let's start ------------------------------------
這篇博客主要是聊聊FCOS中用到的多尺度測試。通過多尺度測試,基於ResNeXt-64x4d-101和可變形卷積的FCOS模型在COCO test-dev上取得了49.0%的AP。我在COCO上對此也進行了驗證,相比於單尺度測試,多尺度測試的AP大概可以提高2個點,還是非常有效的。當然了,由此帶來的最大問題是時間開銷明顯增大很多,這是未來需要解決的一個難點。
在具體說明FCOS多尺度測試源碼前,先簡單回顧一下目標檢測中的多尺度訓練及測試,做目標檢測的應該基本都知道多尺度對最終性能的重要性。
輸入圖片的尺寸對檢測模型的性能影響相當明顯,事實上,多尺度是提升精度最明顯的技巧之一。在基礎網絡部分常常會生成比原圖小數十倍的特徵圖,導致小物體的特徵描述不容易被檢測網絡捕捉。通過輸入更大、更多尺寸的圖片進行訓練,能夠在一定程度上提高檢測模型對物體大小的魯棒性,僅在測試階段引入多尺度,也可享受大尺寸和多尺寸帶來的增益。[1]
------------------------------------ Let's continue ------------------------------------
FCOS多尺度測試就是在水平翻轉、resize到不同尺度的圖像上分別檢測目標,再將預測出的bbox融合到一起,然後經過nms等後續處理得到最終的檢測框。想法比較直接,但是效果顯著。FCOS源碼中,針對多尺度測試有幾個相關的設置:
TEST:
BBOX_AUG:
ENABLED: False
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
其中,ENABLED是多尺度測試的flag,False表示用單尺度測試,速度較快。如果設置成True,則採用多尺度測試的方式。下面幾個是多尺度測試時的參數。
H_FLIP:水平翻轉的flag;
SCALES:測試圖片resize之後的scale;
MAX_SIZE:測試圖片resize時最大的size;
SCALE_H_FLIP:做resize時水平翻轉的flag。
接着,簡單說明多尺度測試的源碼 [2]。主函數是im_detect_bbox_aug,分幾步完成多尺度檢測的任務:
1. 原圖像預測:boxlists_i = im_detect_bbox
2. 翻轉後圖像預測:boxlists_hf = im_detect_bbox_hflip
3. resize到不同尺度預測:boxlists_scl = im_detect_bbox_scale
4. 不同尺度翻轉後圖像預測:boxlists_scl_hf = im_detect_bbox_scale
具體實現細節參照以下源碼,註釋清楚,很容易理解。
import torch
import torchvision.transforms as TT
from fcos_core.config import cfg
from fcos_core.data import transforms as T
from fcos_core.structures.image_list import to_image_list
from fcos_core.structures.bounding_box import BoxList
from fcos_core.modeling.rpn.fcos.inference import make_fcos_postprocessor
def im_detect_bbox_aug(model, images, device):
# Collect detections computed under different transformations
boxlists_ts = []
for _ in range(len(images)):
boxlists_ts.append([])
def add_preds_t(boxlists_t):
for i, boxlist_t in enumerate(boxlists_t):
if len(boxlists_ts[i]) == 0:
# The first one is identity transform, no need to resize the boxlist
boxlists_ts[i].append(boxlist_t)
else:
# Resize the boxlist as the first one
boxlists_ts[i].append(boxlist_t.resize(boxlists_ts[i][0].size))
# Compute detections for the original image (identity transform)
boxlists_i = im_detect_bbox(
model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
)
add_preds_t(boxlists_i)
# Perform detection on the horizontally flipped image
if cfg.TEST.BBOX_AUG.H_FLIP:
boxlists_hf = im_detect_bbox_hflip(
model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
)
add_preds_t(boxlists_hf)
# Compute detections at different scales
for scale in cfg.TEST.BBOX_AUG.SCALES:
max_size = cfg.TEST.BBOX_AUG.MAX_SIZE
boxlists_scl = im_detect_bbox_scale(
model, images, scale, max_size, device
)
add_preds_t(boxlists_scl)
if cfg.TEST.BBOX_AUG.SCALE_H_FLIP:
boxlists_scl_hf = im_detect_bbox_scale(
model, images, scale, max_size, device, hflip=True
)
add_preds_t(boxlists_scl_hf)
assert cfg.MODEL.FCOS_ON, "The multi-scale testing only supports FCOS detector"
# Merge boxlists detected by different bbox aug params
boxlists = []
for i, boxlist_ts in enumerate(boxlists_ts):
bbox = torch.cat([boxlist_t.bbox for boxlist_t in boxlist_ts])
scores = torch.cat([boxlist_t.get_field('scores') for boxlist_t in boxlist_ts])
labels = torch.cat([boxlist_t.get_field('labels') for boxlist_t in boxlist_ts])
boxlist = BoxList(bbox, boxlist_ts[0].size, boxlist_ts[0].mode)
boxlist.add_field('scores', scores)
boxlist.add_field('labels', labels)
boxlists.append(boxlist)
# Apply NMS and limit the final detections
post_processor = make_fcos_postprocessor(cfg)
results = post_processor.select_over_all_levels(boxlists)
return results
def im_detect_bbox(model, images, target_scale, target_max_size, device):
"""
Performs bbox detection on the original image.
"""
transform = TT.Compose([
T.Resize(target_scale, target_max_size),
TT.ToTensor(),
T.Normalize(
mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
)
])
images = [transform(image) for image in images]
images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
return model(images.to(device))
def im_detect_bbox_hflip(model, images, target_scale, target_max_size, device):
"""
Performs bbox detection on the horizontally flipped image.
Function signature is the same as for im_detect_bbox.
"""
transform = TT.Compose([
T.Resize(target_scale, target_max_size),
TT.RandomHorizontalFlip(1.0),
TT.ToTensor(),
T.Normalize(
mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
)
])
images = [transform(image) for image in images]
images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
boxlists = model(images.to(device))
# Invert the detections computed on the flipped image
boxlists_inv = [boxlist.transpose(0) for boxlist in boxlists]
return boxlists_inv
def im_detect_bbox_scale(model, images, target_scale, target_max_size, device, hflip=False):
"""
Computes bbox detections at the given scale.
Returns predictions in the scaled image space.
"""
if hflip:
boxlists_scl = im_detect_bbox_hflip(model, images, target_scale, target_max_size, device)
else:
boxlists_scl = im_detect_bbox(model, images, target_scale, target_max_size, device)
return boxlists_scl
參考文獻
[1] https://www.cnblogs.com/Terrypython/p/10642091.html
[2] https://github.com/tianzhi0549/FCOS/blob/master/fcos_core/engine/bbox_aug.py