目標檢測中的多尺度測試及源碼解析（FCOS多尺度測試）

近期在研究FCOS目標檢測算法，論文發表於ICCV 2019。FCOS方法性能還是很不錯的，代碼工程化也很好，準備follow一下。

FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)

論文：https://arxiv.org/pdf/1904.01355.pdf

------------------------------------ Let's start ------------------------------------

這篇博客主要是聊聊FCOS中用到的多尺度測試。通過多尺度測試，基於ResNeXt-64x4d-101和可變形卷積的FCOS模型在COCO test-dev上取得了49.0%的AP。我在COCO上對此也進行了驗證，相比於單尺度測試，多尺度測試的AP大概可以提高2個點，還是非常有效的。當然了，由此帶來的最大問題是時間開銷明顯增大很多，這是未來需要解決的一個難點。

在具體說明FCOS多尺度測試源碼前，先簡單回顧一下目標檢測中的多尺度訓練及測試，做目標檢測的應該基本都知道多尺度對最終性能的重要性。

輸入圖片的尺寸對檢測模型的性能影響相當明顯，事實上，多尺度是提升精度最明顯的技巧之一。在基礎網絡部分常常會生成比原圖小數十倍的特徵圖，導致小物體的特徵描述不容易被檢測網絡捕捉。通過輸入更大、更多尺寸的圖片進行訓練，能夠在一定程度上提高檢測模型對物體大小的魯棒性，僅在測試階段引入多尺度，也可享受大尺寸和多尺寸帶來的增益。[1]

------------------------------------ Let's continue ------------------------------------

FCOS多尺度測試就是在水平翻轉、resize到不同尺度的圖像上分別檢測目標，再將預測出的bbox融合到一起，然後經過nms等後續處理得到最終的檢測框。想法比較直接，但是效果顯著。FCOS源碼中，針對多尺度測試有幾個相關的設置：

TEST:
  BBOX_AUG:
    ENABLED: False
    H_FLIP: True
    SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
    MAX_SIZE: 2000
    SCALE_H_FLIP: True

其中，ENABLED是多尺度測試的flag，False表示用單尺度測試，速度較快。如果設置成True，則採用多尺度測試的方式。下面幾個是多尺度測試時的參數。

H_FLIP：水平翻轉的flag；

SCALES：測試圖片resize之後的scale；

MAX_SIZE：測試圖片resize時最大的size；

SCALE_H_FLIP：做resize時水平翻轉的flag。

接着，簡單說明多尺度測試的源碼 [2]。主函數是im_detect_bbox_aug，分幾步完成多尺度檢測的任務：

1. 原圖像預測：boxlists_i = im_detect_bbox

2. 翻轉後圖像預測：boxlists_hf = im_detect_bbox_hflip

3. resize到不同尺度預測：boxlists_scl = im_detect_bbox_scale

4. 不同尺度翻轉後圖像預測：boxlists_scl_hf = im_detect_bbox_scale

具體實現細節參照以下源碼，註釋清楚，很容易理解。

import torch
import torchvision.transforms as TT

from fcos_core.config import cfg
from fcos_core.data import transforms as T
from fcos_core.structures.image_list import to_image_list
from fcos_core.structures.bounding_box import BoxList
from fcos_core.modeling.rpn.fcos.inference import make_fcos_postprocessor


def im_detect_bbox_aug(model, images, device):
    # Collect detections computed under different transformations
    boxlists_ts = []
    for _ in range(len(images)):
        boxlists_ts.append([])

    def add_preds_t(boxlists_t):
        for i, boxlist_t in enumerate(boxlists_t):
            if len(boxlists_ts[i]) == 0:
                # The first one is identity transform, no need to resize the boxlist
                boxlists_ts[i].append(boxlist_t)
            else:
                # Resize the boxlist as the first one
                boxlists_ts[i].append(boxlist_t.resize(boxlists_ts[i][0].size))

    # Compute detections for the original image (identity transform)
    boxlists_i = im_detect_bbox(
        model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
    )
    add_preds_t(boxlists_i)

    # Perform detection on the horizontally flipped image
    if cfg.TEST.BBOX_AUG.H_FLIP:
        boxlists_hf = im_detect_bbox_hflip(
            model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
        )
        add_preds_t(boxlists_hf)

    # Compute detections at different scales
    for scale in cfg.TEST.BBOX_AUG.SCALES:
        max_size = cfg.TEST.BBOX_AUG.MAX_SIZE
        boxlists_scl = im_detect_bbox_scale(
            model, images, scale, max_size, device
        )
        add_preds_t(boxlists_scl)

        if cfg.TEST.BBOX_AUG.SCALE_H_FLIP:
            boxlists_scl_hf = im_detect_bbox_scale(
                model, images, scale, max_size, device, hflip=True
            )
            add_preds_t(boxlists_scl_hf)

    assert cfg.MODEL.FCOS_ON, "The multi-scale testing only supports FCOS detector"

    # Merge boxlists detected by different bbox aug params
    boxlists = []
    for i, boxlist_ts in enumerate(boxlists_ts):
        bbox = torch.cat([boxlist_t.bbox for boxlist_t in boxlist_ts])
        scores = torch.cat([boxlist_t.get_field('scores') for boxlist_t in boxlist_ts])
        labels = torch.cat([boxlist_t.get_field('labels') for boxlist_t in boxlist_ts])
        boxlist = BoxList(bbox, boxlist_ts[0].size, boxlist_ts[0].mode)
        boxlist.add_field('scores', scores)
        boxlist.add_field('labels', labels)
        boxlists.append(boxlist)

    # Apply NMS and limit the final detections
    post_processor = make_fcos_postprocessor(cfg)
    results = post_processor.select_over_all_levels(boxlists)

    return results


def im_detect_bbox(model, images, target_scale, target_max_size, device):
    """
    Performs bbox detection on the original image.
    """
    transform = TT.Compose([
        T.Resize(target_scale, target_max_size),
        TT.ToTensor(),
        T.Normalize(
            mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
        )
    ])
    images = [transform(image) for image in images]
    images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
    return model(images.to(device))


def im_detect_bbox_hflip(model, images, target_scale, target_max_size, device):
    """
    Performs bbox detection on the horizontally flipped image.
    Function signature is the same as for im_detect_bbox.
    """
    transform = TT.Compose([
        T.Resize(target_scale, target_max_size),
        TT.RandomHorizontalFlip(1.0),
        TT.ToTensor(),
        T.Normalize(
            mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
        )
    ])
    images = [transform(image) for image in images]
    images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
    boxlists = model(images.to(device))

    # Invert the detections computed on the flipped image
    boxlists_inv = [boxlist.transpose(0) for boxlist in boxlists]
    return boxlists_inv


def im_detect_bbox_scale(model, images, target_scale, target_max_size, device, hflip=False):
    """
    Computes bbox detections at the given scale.
    Returns predictions in the scaled image space.
    """
    if hflip:
        boxlists_scl = im_detect_bbox_hflip(model, images, target_scale, target_max_size, device)
    else:
        boxlists_scl = im_detect_bbox(model, images, target_scale, target_max_size, device)
    return boxlists_scl

參考文獻

[1] https://www.cnblogs.com/Terrypython/p/10642091.html

[2] https://github.com/tianzhi0549/FCOS/blob/master/fcos_core/engine/bbox_aug.py

目標檢測中的多尺度測試及源碼解析（FCOS多尺度測試）

HTML頁面關於高分屏的設置

北歐瑞典挪威芬蘭瑞士TikTok海外網紅與YouTube博主的合作模式

歐洲英國德國法國TikTok與YouTube海外網紅達人的完美合作策略

druid數據源 xml配置

Python並行處理充分利用CPU實現加速

簡單聊聊PointRend: Image Segmentation as Rendering將圖像分割視作渲染問題

目標檢測中的多尺度測試及源碼解析（FCOS多尺度測試）

ubuntu下make、cmake及opencv版本查看

PyTorch無法指定GPU的問題解決

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結