Mask R-CNN開源項目的設計非常易於擴展,只需做簡單的修改就可以訓練自己的數據集。
一、標註數據
這裏我只是簡單從ImageNet2012數據集中選取了兩類圖像:貓和狗,每一類各五十幅圖像,作爲訓練集。再各另取二十副圖像作爲驗證集。再各另取十副圖像作爲測試集。
標註圖像採用VGG Image Annotator (VIA)標註工具。
使用方法請參考:深度學習圖像標註工具VGG Image Annotator (VIA)使用教程
二、修改源代碼
Mask R-CNN的代碼倉庫中已經有多個例子可以參考,我這裏在samples目錄下新建了一個文件夾catvsdog,將samples/balloon/balloon.py複製到samples/catvsdog/下,重命名爲catvsdog.py。
2.1 修改config
我這裏本來是分成2類,但由於我的訓練集中混入了非cat和dog的圖像,所以在標註是我定義了一個not_defined類別,所以這裏是1+3,注意1代表背景是一類。IMAGES_PER_GPU改爲1,其他的參數暫時不修改。
class CatVSDogConfig(Config):
"""Configuration for training on the toy dataset.
Derives from the base Config class and overrides some values.
"""
# Give the configuration a recognizable name
NAME = "catvsdog"
# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 1
# Number of classes (including background)
NUM_CLASSES = 1 + 3 # Background + cat + dog + not_defined
# Number of training steps per epoch
STEPS_PER_EPOCH = 100
# Skip detections with < 90% confidence
DETECTION_MIN_CONFIDENCE = 0.9
2.2 修改Dataset類
2.2.1 修改load_xxx函數
首先要添加類,然後是解析annotations信息。
def load_cat_dog(self, dataset_dir, subset):
"""Load a subset of the CatVSDog dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes. We have only one class to add.
self.add_class("catvsdog", 1, "cat")
self.add_class("catvsdog", 2, "dog")
self.add_class("catvsdog", 3, "not_defined")
# Train or validation dataset?
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)
# Load annotations
# VGG Image Annotator saves each image in the form:
# { 'filename': '28503151_5b5b7ec140_b.jpg',
# 'regions': {
# '0': {
# 'region_attributes': {},
# 'shape_attributes': {
# 'all_points_x': [...],
# 'all_points_y': [...],
# 'name': 'polygon'}},
# ... more regions ...
# },
# 'size': 100202
# }
# We mostly care about the x and y coordinates of each region
annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
annotations = list(annotations.values()) # don't need the dict keys
# The VIA tool saves images in the JSON even if they don't have any
# annotations. Skip unannotated images.
annotations = [a for a in annotations if a['regions']]
# Add images
for a in annotations:
# Get the x, y coordinaets of points of the rects that make up
# the outline of each object instance. There are stores in the
# shape_attributes (see json format above)
rects = [r['shape_attributes'] for r in a['regions']]
name = [r['region_attributes']['name'] for r in a['regions']]
name_dict = {"cat":1, "dog":2, "not_defined":3}
name_id = [name_dict[a] for a in name]
# load_mask() needs the image size to convert rects to masks.
# Unfortunately, VIA doesn't include it in JSON, so we must read
# the image. This is only managable since the dataset is tiny.
image_path = os.path.join(dataset_dir, a['filename'])
image = skimage.io.imread(image_path)
height, width = image.shape[:2]
self.add_image(
"catvsdog",
image_id=a['filename'], # use file name as a unique image id
path=image_path,
class_id=name_id,
width=width, height=height,
polygons=rects)
2.2.2 修改load_mask函數
這裏因爲我在標註是爲簡單起見,只用了矩形標註框,所以這裏使用的是skimage.draw.rectangle和balloon裏使用的skimage.draw.polyon不同。
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
# If not a balloon dataset image, delegate to parent class.
image_info = self.image_info[image_id]
if image_info["source"] != "catvsdog":
return super(self.__class__, self).load_mask(image_id)
name_id = image_info["class_id"]
print(name_id)
# Convert polygons to a bitmap mask of shape
# [height, width, instance_count]
info = self.image_info[image_id]
mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
dtype=np.uint8)
class_ids = np.array(name_id, dtype=np.int32)
for i, p in enumerate(info["polygons"]):
# Get indexes of pixels inside the polygon and set them to 1
rr, cc = skimage.draw.rectangle((p['y'], p['x']), extent=(p['height'], p['width']))
mask[rr, cc, i] = 1
# Return mask, and array of class IDs of each instance. Since we have
# one class ID only, we return an array of 1s
return (mask.astype(np.bool), class_ids)
2.2.3 修改image_reference函數
def image_reference(self, image_id):
"""Return the path of the image."""
info = self.image_info[image_id]
if info["source"] == "catvsdog":
return info["path"]
else:
super(self.__class__, self).image_reference(image_id)
2.2.4 修改train函數
def train(model):
"""Train the model."""
# Training dataset.
dataset_train = CatVSDogDataset()
dataset_train.load_cat_dog(args.dataset, "train")
dataset_train.prepare()
# Validation dataset
dataset_val = CatVSDogDataset()
dataset_val.load_cat_dog(args.dataset, "val")
dataset_val.prepare()
# *** This training schedule is an example. Update to your needs ***
# Since we're using a very small dataset, and starting from
# COCO trained weights, we don't need to train too long. Also,
# no need to train all layers, just the heads should do it.
print("Training network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=30,
layers='heads')
三、訓練
請提前下載好coco預訓練數據mask_rcnn_coco.h5。
我在Mask R-CNN代碼倉庫根目錄下執行:
python3 catvsdog.py train --dataset=/path/to/myCatVSDog --weights=coco
這裏注意在哪個文件夾下執行命令修改相應的ROOT_DIR。
訓練結束後生成了一些列模型數據。
四、測試
我不太習慣用.ipynb文件,所以把他轉換成py文件。用jupyter notebook打開samples/demo.ipynb。
選擇菜單File --> Download as --> Python(.py),保存成python文件即可。
修改代碼:
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
# Root directory of the project
ROOT_DIR = os.path.abspath("../")
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import config
sys.path.append(os.path.join(ROOT_DIR, "samples/catvsdog/")) # To find local version
import catvsdog
#get_ipython().run_line_magic('matplotlib', 'inline')
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "catvsdog_logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_catvsdog_0029.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
class InferenceConfig(catvsdog.CatVSDogConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
class_names = ['BG', 'cat', 'dog', 'not_defined']
image = skimage.io.imread('ILSVRC2012_val_00037858.JPEG')
# Run detection
results = model.detect([image], verbose=1)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
class_names, r['scores'])
執行:
python3 demo.py