學習筆記|Pytorch使用教程34(圖像目標檢測一瞥(下))

學習筆記|Pytorch使用教程34

本學習筆記主要摘自“深度之眼”,做一個總結,方便查閱。
使用Pytorch版本爲1.2

  • 圖像目標檢測是什麼?
  • 模型是如何完成目標檢測的?
  • 深度學習目標檢測模型簡介
  • PyTorch中的Faster RCNN訓練

四.PyTorch中的Faster RCNN訓練

  • 1.**torchvision.models.detection.fasterrcnn_resnet50_fpn()**返回FasterRCNN實例
  • 2.class FasterRCNN(GeneralizedRCNN)
  • 3.class GeneralizedRCNN(nn.Module)
    forward():
  1. features = self.backbone(images.tensors)
  2. proposals, proposal_losses = self.rpn(images, features, targets)
  3. detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)

接下來在學習筆記|Pytorch使用教程33(圖像目標檢測一瞥(上))中的代碼設置斷點:output_list = model(input_list),並進入(step into)
在這裏插入圖片描述
在該處設置斷點,並進入。得到faster rcnn的基類。

class GeneralizedRCNN(nn.Module):
    """
    Main class for Generalized R-CNN.

    Arguments:
        backbone (nn.Module):
        rpn (nn.Module):
        heads (nn.Module): takes the features + the proposals from the RPN and computes
            detections / masks from it.
        transform (nn.Module): performs the data transformation from the inputs to feed into
            the model
    """

    def __init__(self, backbone, rpn, roi_heads, transform):
        super(GeneralizedRCNN, self).__init__()
        self.transform = transform
        self.backbone = backbone
        self.rpn = rpn
        self.roi_heads = roi_heads

    def forward(self, images, targets=None):
        """
        Arguments:
            images (list[Tensor]): images to be processed
            targets (list[Dict[Tensor]]): ground-truth boxes present in the image (optional)

        Returns:
            result (list[BoxList] or dict[Tensor]): the output from the model.
                During training, it returns a dict[Tensor] which contains the losses.
                During testing, it returns list[BoxList] contains additional fields
                like `scores`, `labels` and `mask` (for Mask R-CNN models).

        """
        if self.training and targets is None:
            raise ValueError("In training mode, targets should be passed")
        original_image_sizes = [img.shape[-2:] for img in images]
        images, targets = self.transform(images, targets)
        features = self.backbone(images.tensors)
        if isinstance(features, torch.Tensor):
            features = OrderedDict([(0, features)])
        proposals, proposal_losses = self.rpn(images, features, targets)
        detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
        detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

        losses = {}
        losses.update(detector_losses)
        losses.update(proposal_losses)

        if self.training:
            return losses

        return detections

features = self.backbone(images.tensors)設置斷點,瞭解backbone structure 提取feature 的過程,進入(step into)。類似地,在__call__(self, *input, **kwargs)中設置斷點:result = self.forward(*input, **kwargs),再次進入。
在這裏插入圖片描述
發現module已經被封裝好了。發現在初始化的時候已經定義好了。
在這裏插入圖片描述
於是來到定義的地方:model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)。然後 Go to definition。
在這裏插入圖片描述
繼續然後 Go to definition。
在這裏插入圖片描述
其中backbone_name是’resnet50’。其實就是使用resnet50進行特徵提取。
現在直接查看feature。
在這裏插入圖片描述
在這裏插入圖片描述
有5個key–value對。其實對應在resnet中的五個輸出。
在這裏插入圖片描述
回顧學習筆記|Pytorch使用教程33(圖像目標檢測一瞥(上))中提到的feature map數據流:[256, h_f, w_f]
接下來查看RPN網絡。進入:proposals, proposal_losses = self.rpn(images, features, targets)
在這裏插入圖片描述
代碼:objectness, pred_bbox_deltas = self.head(features)實現RPN網絡的核心功能,輸出分類向量objectness,實現前背景分類。把候選框的四個偏移量存在pred_bbox_deltas裏面。
代碼:anchors = self.anchor_generator(images, features)這裏是生成邊界框。
代碼:boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)使用NMS挑選邊界框。
A 首先進入objectness, pred_bbox_deltas = self.head(features)

class RPNHead(nn.Module):
    """
    Adds a simple RPN Head with classification and regression heads

    Arguments:
        in_channels (int): number of channels of the input feature
        num_anchors (int): number of anchors to be predicted
    """

    def __init__(self, in_channels, num_anchors):
        super(RPNHead, self).__init__()
        self.conv = nn.Conv2d(
            in_channels, in_channels, kernel_size=3, stride=1, padding=1
        )
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        self.bbox_pred = nn.Conv2d(
            in_channels, num_anchors * 4, kernel_size=1, stride=1
        )

        for l in self.children():
            torch.nn.init.normal_(l.weight, std=0.01)
            torch.nn.init.constant_(l.bias, 0)

    def forward(self, x):
        logits = []
        bbox_reg = []
        for feature in x:
            t = F.relu(self.conv(feature))
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
        return logits, bbox_reg

其中的x就是上一步生成的5個特徵圖。
在這裏插入圖片描述
查看對一個feature map的操作即可。

  • 1.F.relu(self.conv(feature)),進一步使用卷積,在使用relu激活函數。self.convself.conv = nn.Conv2d( in_channels, in_channels, kernel_size=3, stride=1, padding=1 )
    在這裏插入圖片描述

  • 2.根據上面生成的feature map(t)進行邏輯迴歸:logits.append(self.cls_logits(t))。查看self.cls_logits,是self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)。查看其shape可知num_anchors=3,也就是特徵圖上每一個像素點都會預測3個anchor。
    在這裏插入圖片描述

  • 3.根據上面生成的feature map(t)進行邏輯迴歸進行邊界框迴歸:bbox_reg.append(self.bbox_pred(t))。查看self.bbox_pred,是self.bbox_pred = nn.Conv2d( in_channels, num_anchors * 4, kernel_size=1, stride=1 ),可知邊界框迴歸是對每anchor預測4個值。這就是12 = 3 * 4 的原因。
    在這裏插入圖片描述

  • 4.當5個特徵圖全部計算完畢後,會生成列表objectnesspred_bbox_deltas,裏面存儲着上述5個特徵圖計算結果。
    在這裏插入圖片描述

  • 5.回顧學習筆記|Pytorch使用教程33(圖像目標檢測一瞥(上))中提到的2 Softmax 數據流:[num_anchors, h_f, w_f] 和 Regressors: [num_anchors * 4, h_f, w_f]

B 接着根據上述計算結果生成一系列anchors:anchors = self.anchor_generator(images, features)。查看生成20+萬個anchor
在這裏插入圖片描述
這裏225603個anchor,都會使用上述計算出來的偏移量,是和所有偏移量的總量是一致的(一個anchor需要一個偏移量):
在這裏插入圖片描述
C 下面一步則是對生成的anchor進行篩選:boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)。進入(step into)。
在這裏插入圖片描述
選取top_n 個anchor。如下圖所示,1是因爲該批次只有1張圖片。4693個anchor是從20+W個anchor中篩選出來的。
在這裏插入圖片描述
接下來使用NMS進一步篩選:keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)。使用keep = keep[:self.post_nms_top_n]保留NMS處理過後的值,最終可以得到final_boxes, final_scores。查看輸出值的shape,爲什麼是1000呢?這個是超參數。是在faster_rcnn.py中設置的。
在這裏插入圖片描述
在這裏插入圖片描述
計算完後,返回,開始生成ROI。進入(step into)該代碼。
在這裏插入圖片描述
查看相關代碼:
在這裏插入圖片描述

  • 1.代碼:box_features = self.box_roi_pool(features, proposals, image_shapes)。使用proposals在feature map山進行“摳圖”,即獲取子特徵區域,並池化成統一尺度(使得一些列不同尺度的特徵圖變成同一尺度的特徵圖)。
    現在進入(step into)box_features = self.box_roi_pool(features, proposals, image_shapes)查看。
    在這裏插入圖片描述
    現關注forward函數。核心代碼如下:
    在這裏插入圖片描述
    執行完之後,生成統一shape爲:
    在這裏插入圖片描述

  • 2.代碼:box_features = self.box_head(box_features)。這個其實就是兩個FC層。進圖(step into)
    在這裏插入圖片描述
    查看輸出時的shape爲:即把25677 映射成了1024
    在這裏插入圖片描述

  • 3.代碼:class_logits, box_regression = self.box_predictor(box_features)。進行類別預測和邊界框迴歸。進入(step into)
    在這裏插入圖片描述
    其中self.cls_scoreself.bbox_pred均是使用兩個全連接層進行預測。查看shape如下。因爲使用的是COCO數據集,所有使用的是類別是91個(90個類別+1個背景類)。
    在這裏插入圖片描述

  • 4.計算完畢後查看detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)的輸出。
    在這裏插入圖片描述
    最後使用detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes),把預測出的座標映射到原圖上。
    總結:
    在這裏插入圖片描述
    在這裏插入圖片描述
    在這裏插入圖片描述
    在這裏插入圖片描述
    在這裏插入圖片描述
    下面使用上述數據集進行訓練測試:
    測試代碼:

import os
import time
import torch.nn as nn
import torch
import random
import numpy as np
import torchvision.transforms as transforms
import torchvision
from PIL import Image
import torch.nn.functional as F
from tools.my_dataset import PennFudanDataset
from tools.common_tools import set_seed
from torch.utils.data import DataLoader
from matplotlib import pyplot as plt
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.transforms import functional as F

set_seed(1)  # 設置隨機種子

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# classes_coco
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


def vis_bbox(img, output, classes, max_vis=40, prob_thres=0.4):
    fig, ax = plt.subplots(figsize=(12, 12))
    ax.imshow(img, aspect='equal')
    
    out_boxes = output_dict["boxes"].cpu()
    out_scores = output_dict["scores"].cpu()
    out_labels = output_dict["labels"].cpu()
    
    num_boxes = out_boxes.shape[0]
    for idx in range(0, min(num_boxes, max_vis)):

        score = out_scores[idx].numpy()
        bbox = out_boxes[idx].numpy()
        class_name = classes[out_labels[idx]]

        if score < prob_thres:
            continue

        ax.add_patch(plt.Rectangle((bbox[0], bbox[1]), bbox[2] - bbox[0], bbox[3] - bbox[1], fill=False,
                                   edgecolor='red', linewidth=3.5))
        ax.text(bbox[0], bbox[1] - 2, '{:s} {:.3f}'.format(class_name, score), bbox=dict(facecolor='blue', alpha=0.5),
                fontsize=14, color='white')
    plt.show()
    plt.close()


class Compose(object):
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target


class RandomHorizontalFlip(object):
    def __init__(self, prob):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
        return image, target


class ToTensor(object):
    def __call__(self, image, target):
        image = F.to_tensor(image)
        return image, target


if __name__ == "__main__":

    # config
    LR = 0.001
    num_classes = 2
    batch_size = 1
    start_epoch, max_epoch = 0, 30
    train_dir = os.path.join(BASE_DIR, "..", "..", "data", "PennFudanPed")
    train_transform = Compose([ToTensor(), RandomHorizontalFlip(0.5)])

    # step 1: data
    train_set = PennFudanDataset(data_dir=train_dir, transforms=train_transform)

    # 收集batch data的函數
    def collate_fn(batch):
        return tuple(zip(*batch))

    train_loader = DataLoader(train_set, batch_size=batch_size, collate_fn=collate_fn)

    # step 2: model
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) # replace the pre-trained head with a new one

    model.to(device)

    # step 3: loss
    # in lib/python3.6/site-packages/torchvision/models/detection/roi_heads.py
    # def fastrcnn_loss(class_logits, box_regression, labels, regression_targets)

    # step 4: optimizer scheduler
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=LR, momentum=0.9, weight_decay=0.0005)
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

    # step 5: Iteration

    for epoch in range(start_epoch, max_epoch):

        model.train()
        for iter, (images, targets) in enumerate(train_loader):

            images = list(image.to(device) for image in images)
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

            # if torch.cuda.is_available():
            #     images, targets = images.to(device), targets.to(device)

            loss_dict = model(images, targets)  # images is list; targets is [ dict["boxes":**, "labels":**], dict[] ]

            losses = sum(loss for loss in loss_dict.values())

            print("Training:Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} ".format(
                epoch, max_epoch, iter + 1, len(train_loader), losses.item()))

            optimizer.zero_grad()
            losses.backward()
            optimizer.step()

        lr_scheduler.step()

    # test
    model.eval()

    # config
    vis_num = 5
    vis_dir = os.path.join(BASE_DIR, "..", "..", "data", "PennFudanPed", "PNGImages")
    img_names = list(filter(lambda x: x.endswith(".png"), os.listdir(vis_dir)))
    random.shuffle(img_names)
    preprocess = transforms.Compose([transforms.ToTensor(), ])

    for i in range(0, vis_num):

        path_img = os.path.join(vis_dir, img_names[i])
        # preprocess
        input_image = Image.open(path_img).convert("RGB")
        img_chw = preprocess(input_image)

        # to device
        if torch.cuda.is_available():
            img_chw = img_chw.to('cuda')
            model.to('cuda')

        # forward
        input_list = [img_chw]
        with torch.no_grad():
            tic = time.time()
            print("input img tensor shape:{}".format(input_list[0].shape))
            output_list = model(input_list)
            output_dict = output_list[0]
            print("pass: {:.3f}s".format(time.time() - tic))

        # visualization
        vis_bbox(input_image, output_dict, COCO_INSTANCE_CATEGORY_NAMES, max_vis=20, prob_thres=0.5)  # for 2 epoch for nms

輸出:
在這裏插入圖片描述
在這裏插入圖片描述
在這裏插入圖片描述
在這裏插入圖片描述
其中微調model成可以使用訓練這個行人檢測(只需要預測兩個類:行人(前景)、背景):

    # step 2: model
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) # replace the pre-trained head with a new one

先查看一下model的構成:

ipdb> model
FasterRCNN(
  (transform): GeneralizedRCNNTransform()
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
		......
      )
      (layer2): Sequential(
		......
    )
    (fpn): FeaturePyramidNetwork(
      (inner_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (layer_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (extra_blocks): LastLevelMaxPool()
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=12544, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=91, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=364, bias=True)
    )
  )
)

所以可知:
在這裏插入圖片描述
微調之後:
在這裏插入圖片描述

最後:
目標檢測推薦github: https://github com/amusi/awesome-object-detection

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章