Mask R-CNN開源項目的設計非常易於擴展，只需做簡單的修改就可以訓練自己的數據集。

一、標註數據

這裏我只是簡單從ImageNet2012數據集中選取了兩類圖像：貓和狗，每一類各五十幅圖像，作爲訓練集。再各另取二十副圖像作爲驗證集。再各另取十副圖像作爲測試集。

標註圖像採用VGG Image Annotator (VIA)標註工具。

使用方法請參考：深度學習圖像標註工具VGG Image Annotator (VIA)使用教程

二、修改源代碼

Mask R-CNN的代碼倉庫中已經有多個例子可以參考，我這裏在samples目錄下新建了一個文件夾catvsdog，將samples/balloon/balloon.py複製到samples/catvsdog/下，重命名爲catvsdog.py。

2.1 修改config

我這裏本來是分成2類，但由於我的訓練集中混入了非cat和dog的圖像，所以在標註是我定義了一個not_defined類別，所以這裏是1+3，注意1代表背景是一類。IMAGES_PER_GPU改爲1，其他的參數暫時不修改。

class CatVSDogConfig(Config):
    """Configuration for training on the toy  dataset.
    Derives from the base Config class and overrides some values.
    """
    # Give the configuration a recognizable name
    NAME = "catvsdog"

    # We use a GPU with 12GB memory, which can fit two images.
    # Adjust down if you use a smaller GPU.
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 3  # Background + cat + dog + not_defined

    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

    # Skip detections with < 90% confidence
    DETECTION_MIN_CONFIDENCE = 0.9

2.2 修改Dataset類

2.2.1 修改load_xxx函數

首先要添加類，然後是解析annotations信息。

   def load_cat_dog(self, dataset_dir, subset):
        """Load a subset of the CatVSDog dataset.
        dataset_dir: Root directory of the dataset.
        subset: Subset to load: train or val
        """
        # Add classes. We have only one class to add.
        self.add_class("catvsdog", 1, "cat")
        self.add_class("catvsdog", 2, "dog")
        self.add_class("catvsdog", 3, "not_defined")

        # Train or validation dataset?
        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)

        # Load annotations
        # VGG Image Annotator saves each image in the form:
        # { 'filename': '28503151_5b5b7ec140_b.jpg',
        #   'regions': {
        #       '0': {
        #           'region_attributes': {},
        #           'shape_attributes': {
        #               'all_points_x': [...],
        #               'all_points_y': [...],
        #               'name': 'polygon'}},
        #       ... more regions ...
        #   },
        #   'size': 100202
        # }
        # We mostly care about the x and y coordinates of each region
        annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
        annotations = list(annotations.values())  # don't need the dict keys

        # The VIA tool saves images in the JSON even if they don't have any
        # annotations. Skip unannotated images.
        annotations = [a for a in annotations if a['regions']]
        
        # Add images
        for a in annotations:
            # Get the x, y coordinaets of points of the rects that make up
            # the outline of each object instance. There are stores in the
            # shape_attributes (see json format above)
            rects = [r['shape_attributes'] for r in a['regions']]
            name = [r['region_attributes']['name'] for r in a['regions']]
            name_dict = {"cat":1, "dog":2, "not_defined":3}
            name_id = [name_dict[a] for a in name]

            # load_mask() needs the image size to convert rects to masks.
            # Unfortunately, VIA doesn't include it in JSON, so we must read
            # the image. This is only managable since the dataset is tiny.
            image_path = os.path.join(dataset_dir, a['filename'])
            image = skimage.io.imread(image_path)
            height, width = image.shape[:2]

            self.add_image(
                "catvsdog",
                image_id=a['filename'],  # use file name as a unique image id
                path=image_path,
                class_id=name_id,
                width=width, height=height,
                polygons=rects)

2.2.2 修改load_mask函數

這裏因爲我在標註是爲簡單起見，只用了矩形標註框，所以這裏使用的是skimage.draw.rectangle和balloon裏使用的skimage.draw.polyon不同。

    def load_mask(self, image_id):
        """Generate instance masks for an image.
       Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a balloon dataset image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "catvsdog":
            return super(self.__class__, self).load_mask(image_id)
        
        name_id = image_info["class_id"]
        print(name_id)
        # Convert polygons to a bitmap mask of shape
        # [height, width, instance_count]
        info = self.image_info[image_id]
        mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                        dtype=np.uint8)
        class_ids = np.array(name_id, dtype=np.int32)

        for i, p in enumerate(info["polygons"]):
            # Get indexes of pixels inside the polygon and set them to 1
            rr, cc = skimage.draw.rectangle((p['y'], p['x']), extent=(p['height'], p['width']))
            mask[rr, cc, i] = 1

        # Return mask, and array of class IDs of each instance. Since we have
        # one class ID only, we return an array of 1s
        return (mask.astype(np.bool), class_ids)

2.2.3 修改image_reference函數

def image_reference(self, image_id):
    """Return the path of the image."""
    info = self.image_info[image_id]
    if info["source"] == "catvsdog":
        return info["path"]
    else:
        super(self.__class__, self).image_reference(image_id)

2.2.4 修改train函數

def train(model):
    """Train the model."""
    # Training dataset.
    dataset_train = CatVSDogDataset()
    dataset_train.load_cat_dog(args.dataset, "train")
    dataset_train.prepare()

    # Validation dataset
    dataset_val = CatVSDogDataset()
    dataset_val.load_cat_dog(args.dataset, "val")
    dataset_val.prepare()

    # *** This training schedule is an example. Update to your needs ***
    # Since we're using a very small dataset, and starting from
    # COCO trained weights, we don't need to train too long. Also,
    # no need to train all layers, just the heads should do it.
    print("Training network heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=30,
                layers='heads')

三、訓練

請提前下載好coco預訓練數據mask_rcnn_coco.h5。

我在Mask R-CNN代碼倉庫根目錄下執行：

python3 catvsdog.py train --dataset=/path/to/myCatVSDog --weights=coco

這裏注意在哪個文件夾下執行命令修改相應的ROOT_DIR。

訓練結束後生成了一些列模型數據。

四、測試

我不太習慣用.ipynb文件，所以把他轉換成py文件。用jupyter notebook打開samples/demo.ipynb。

選擇菜單File --> Download as --> Python(.py)，保存成python文件即可。

修改代碼：

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("../")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import config
sys.path.append(os.path.join(ROOT_DIR, "samples/catvsdog/"))  # To find local version
import catvsdog

#get_ipython().run_line_magic('matplotlib', 'inline')

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "catvsdog_logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_catvsdog_0029.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)


class InferenceConfig(catvsdog.CatVSDogConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()


# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)


class_names = ['BG', 'cat', 'dog', 'not_defined']

image = skimage.io.imread('ILSVRC2012_val_00037858.JPEG')

# Run detection
results = model.detect([image], verbose=1)

# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            class_names, r['scores'])

執行：

python3 demo.py

【Mask R-CNN】（七）：製作並訓練自己的數據集最詳細教程

一、標註數據

二、修改源代碼

2.1 修改config

2.2 修改Dataset類

2.2.1 修改load_xxx函數

2.2.2 修改load_mask函數

2.2.3 修改image_reference函數

2.2.4 修改train函數

三、訓練

四、測試

探究職業發展的關鍵：能力模型解讀

如何在低代碼平臺中引用 JavaScript ？

高效率使用windows

智能決策新時代：可視化大屏是否能夠超越傳統白板？

解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent

分享幾個.NET開源的AI和LLM相關項目框架

【OpenVINO】學習筆記(03):英特爾® OpenVINO™工具套件初級課程-如何加速視頻處理進程？

【OpenVINO】學習筆記(05):英特爾® OpenVINO™工具套件初級課程-視頻分析處理的完整流程

【OpenVINO】學習筆記(04):英特爾® OpenVINO™工具套件初級課程-如何給視覺應用中的神經網絡加速？...

【OpenVINO】學習筆記(02):英特爾® OpenVINO™工具套件初級課程-什麼是視頻？什麼是計算機視覺？如何使用計算機來處理視頻?...

【OpenVINO】學習筆記(01):英特爾® OpenVINO™工具套件初級課程-爲什麼我們需要人工智能

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結