目標檢測初體驗（二）自制人臉檢測功能

之前筆者在學習OpenCV的時候，曾經接觸過人臉檢測，那時候我們只需要一個函數就能輕鬆實現人臉檢測。關於如何在OpenCV中實現人臉檢測功能，可以參考文章：OpenCV神技——人臉檢測，貓臉檢測。那時候的感覺是，CV是如此的神奇，而人臉檢測正是筆者邁向CV的關鍵一步。
以前，筆者從沒想過能自己實現人臉檢測功能，直到不久前接觸了目標檢測，接觸了darknet，腦海中就有了“自制人臉檢測”這個想法。如果能夠使用darknet來自己實現人臉檢測功能，那該是多麼酷的一件事情啊！
經過近兩天的探索，筆者終於琢磨出瞭如何利用darknet來實現人臉檢測功能，本文將會分享這方面的經歷。本着化繁爲簡的原則，本文將儘可能輕鬆地幫助你理解如何來實現人臉檢測功能。

數據集

如果想實現人臉檢測功能，相關的數據集是必不可少的。那麼，數據集從何而來呢？一種辦法是自己採集，自己標註，這是最原始的辦法，費事費力，但能幫助你理解AI的苦逼之處（其實AI並不像表明展示的那麼光鮮）。幸運的是，已經有不少有心人幫助我們做了數據採集方面的工作（真的應該感謝他們，所以說要學會分享）。
人臉檢測的相關數據集可以參考網址：http://shuoyang1213.me/WIDERFACE/index.html，該網址收集了大量人臉檢測的標註數據和圖像，也可以讓你嘗試自己的目標檢測的模型的效果，在某種程度上有點類似於ImageNet的意味。我們需要下載的數據集如下：

筆者下載了WIDER Face Training Images和標註數據Face annotations。訓練圖片一共12880張，分爲62個類型，分爲不同的場景。在標註數據中，內容如下

我們所需要的人臉檢測的標註數據（圖形框，box）在文件wider_face_train_bbx_gt.txt中，前幾行如下：

0--Parade/0_Parade_marchingband_1_849.jpg
1
449 330 122 149 0 0 0 0 0 0 
0--Parade/0_Parade_Parade_0_904.jpg
1
361 98 263 339 0 0 0 0 0 0 
0--Parade/0_Parade_marchingband_1_799.jpg
21
78 221 7 8 2 0 0 0 0 0 
78 238 14 17 2 0 0 0 0 0
113 212 11 15 2 0 0 0 0 0 
134 260 15 15 2 0 0 0 0 0 
163 250 14 17 2 0 0 0 0 0 
201 218 10 12 2 0 0 0 0 0 
182 266 15 17 2 0 0 0 0 0 
245 279 18 15 2 0 0 0 0 0 
304 265 16 17 2 0 0 0 2 1 
328 295 16 20 2 0 0 0 0 0 
389 281 17 19 2 0 0 0 2 0

第一行爲圖片名稱，第二行爲人臉數量，第三行爲人臉框的標註數據，前四個分別爲左，上，寬度，高度，依次類推，一張圖片中會有多個人臉標註的結果。
有了上述的數據集，我們還需要將這些圖片和標註數據轉換爲darknet支持的格式。爲此，筆者寫了一個Python腳本，代碼（get_human_face_label_data.py）如下：

# -*- coding: utf-8 -*-
# author: Jclian91
# place: Pudong Shanghai
# time: 2020/5/11 10:51 下午

import os, re, json, traceback
from random import shuffle
import cv2
from collections import defaultdict

img_dir = "../WIDER_train"

img_count = 0
file_list = []
for root, dirs, files in os.walk(img_dir):
    for file in files:
        img_count += 1
        file_list.append(os.path.join(root, file))

print("Total image number: %d" % img_count)

# make directory
if not os.path.exists("../human_face_train_images"):
    os.system("mkdir ../human_face_train_images")
if not os.path.exists("../human_face_train_labels"):
    os.system("mkdir ../human_face_train_labels")
if not os.path.exists("../human_face_val_images"):
    os.system("mkdir ../human_face_val_images")
if not os.path.exists("../human_face_val_labels"):
    os.system("mkdir ../human_face_val_labels")

# shuffle the files
shuffle(file_list)

# get label data
with open("wider_face_train_bbx_gt.txt", "r", encoding="utf-8") as h:
    content = [_.strip() for _ in h.readlines()]

# get labeled data into arrange form
line_index = []
for i, line in enumerate(content):
    if "." in line:
        line_index.append(i)

line_index.append(len(content)+1)

segments = []
for j in range(len(line_index)-1):
    segments.append(content[line_index[j]: line_index[j+1]])

img_box_dict = defaultdict(list)
for segment in segments:
    for i in range(2, len(segment)):
        img_box_dict[segment[0].split('/')[-1]].append(segment[i].split()[:4])

# copy images to rights place and write correct labeled data into txt file
# train data
train_part = 0.8
for i in range(int(train_part * img_count)):
    print(i, file_list[i])
    file = file_list[i].split('/')[-1]
    os.system("cp %s ../human_face_train_images/%s" % (file_list[i], file))
    with open("../human_face_train.txt", "a", encoding="utf-8") as f:
        f.write("human_face_train_images/%s" % file + "\n")

    img = cv2.imread(file_list[i], 0)
    height, width = img.shape
    with open("../human_face_train_labels/%s" % file.replace(".jpg", ".txt"), "w", encoding="utf-8") as f:
        for label in img_box_dict[file]:
            left, top, w, h = [int(_) for _ in label]
            # to avoid any of the coordinate becomes 0
            if left == 0:
                left = 0.1
            if top == 0:
                top = 0.1
            if w == 0:
                w = 0.1
            if h == 0:
                h = 0.1
            x_center = (left + w/2)/width
            y_center = (top + h/2)/height
            f.write("0 %s %s %s %s\n" % (x_center, y_center, w/width, h/height))

# val data
for i in range(int(train_part * img_count)+1, img_count):
    print(i, file_list[i])
    file = file_list[i].split('/')[-1]
    os.system("cp %s ../human_face_val_images/%s" % (file_list[i], file))
    with open("../human_face_val.txt", "a", encoding="utf-8") as f:
        f.write("human_face_val_images/%s" % file + "\n")

    img = cv2.imread(file_list[i], 0)
    height, width = img.shape
    with open("../human_face_val_labels/%s" % file.replace(".jpg", ".txt"), "w", encoding="utf-8") as f:
        for label in img_box_dict[file]:
            left, top, w, h = [int(_) for _ in label]
            # to avoid any of the coordinate becomes 0
            if left == 0:
                left = 0.1
            if top == 0:
                top = 0.1
            if w == 0:
                w = 0.1
            if h == 0:
                h = 0.1
            x_center = (left + w/2)/width
            y_center = (top + h/2)/height
            f.write("0 %s %s %s %s\n" % (x_center, y_center, w/width, h/height))

運行該腳本後，產生的結果如下：

我們有必要對這個輸出結果做一些瞭解。首先是human_face.names，該文件只有一行，內容爲human face，也就是目標檢測的類別只有一類，名稱爲human face。human_face.data內容如下：

classes= 1
train = human_face_train.txt
valid = human_face_val.txt
names = human_face.names
backup = backup

該文件表明，目標檢測的類別數量（class）爲1，訓練數據的信息位於human_face_train.txt，驗證數據的信息位於human_face_val.txt，類別名稱（names）位於文件human_face.mames中，訓練好的模型文件位於backup文件夾。
在human_face_train.txt中，存儲訓練圖片的路徑，並且有其對應的標註數據位於human_face_train_labels文件夾下，關於該標註數據的格式，簡單來說，就是每行一個標註樣本，第一個數字爲類別id，後面的數字分別爲標註框的橫向中心點和縱向中心點，以及標註框的寬度和高度。具體的內容不再這裏展示，有興趣的讀者可以參考上面的Python代碼，並不會很難。
事實上，由於筆者在GPU上運行上面代碼，複製圖片和生成標註數據的過程很慢，因此，當複製到大約2000張圖片的時候，筆者就停止了該程序，因此，實際參與訓練的圖片只有大約2000張。

如何使用darknet？

關於如何使用darknet來訓練自己的目標檢測的數據，網址已經有不少熱心的作者貢獻了自己的經驗，因此，筆者也只是向他們學習而已。
第一步，應該是編譯darknet，主要是Makefile文件的前幾行：

GPU=0
CUDNN=0
OPENCV=0
OPENMP=0
DEBUG=0

一般，只需要改動GPU相關部分，如果不使用GPU，則設置GPU=0；如果需要使用GPU，則設置GPU=1。編譯該Makefile文件，如果在使用GPU環境下編譯不通過，則需要將Makefile中第24行的NVCC=nvcc改爲Cuda的完整路徑。
接着，我們在修改cfg文件夾下的yolov3-tiny.cfg文件，改動如下：

Line 3: set batch=24 → using 24 images for every training step
Line 4: set subdivisions=8 → the batch will be divided by 8
Line 127: set filters=(classes + 5)*3   → in our case filters=18
Line 135: set classes=1  →  the number of categories we want to detect
Line 171: set filters=(classes + 5)*3  → in our case filters=18
Line 177: set classes=1   → the number of categories we want to detect

因爲只有一個類別，所以上述的設置是符合要求的。
然後，我們需要下載預訓練的模型文件weights/darknet53.conv.74，下載網址爲：https://pjreddie.com/media/files/darknet53.conv.74 。
爲了方便我們訓練，寫一個Shell腳本train.sh，如下：

./darknet detector train human_face.data cfg/yolov3-tiny.cfg ./weights/darknet53.conv.74 -gpus 3,4,5,6,7,8

注意，在上述的腳本中我們使用GPU訓練，不使用GPU的話，訓練速度很慢。在訓練了一個下午後，模型的avg loss大約爲4點多，（關於更多的模型訓練的評估指標，比如IOU，Total loss, Accuracy for Class等在這裏不詳細敘述），GPU因爲out of memery而停止，事實上，一般我們要等到模型的avg loss大約爲0.06才停止。
儘管如此，我們在2000張圖片上訓練了一下午後，保存的模型還是很有效果的。

模型預測

接下來就是最激動人心的時刻，因爲我們要利用剛纔訓練好的模型來進行人臉檢測！
預測的Shell腳本（test.sh）如下：

./darknet detector test human_face.data cfg/yolov3-tiny.cfg backup/yolov3-tiny.backup data/walk_persons.jpeg -thresh 0.5 -gpus 3,4,5,6,7,8

接下來，筆者將會給出一系列的預測效果，不需要過多解釋，上圖即可。另外，由於圖片中設計到人物，因此筆者不希望被誤會侵犯肖像權，首先圖片都是公開圖片，其次這只是計算機試驗，不代表任何情感，只是一個實驗。

總結

利用darknet，我們可以方便地實現目標檢測，上述的人臉檢測只是其中的一個例子。
在實際使用darknet過程中，GPU經常會出現Segmentation fault (core dumped)錯誤，筆者暫時還無法解決。
關於該人臉檢測項目，筆者會在適當的時候公開源碼，歡迎大家關注我的Github,地址爲：https://github.com/percent4 。
後續筆者會做進一步的研究，感謝大家的閱讀和關注~

參考文章

人臉檢測數據集網站：http://shuoyang1213.me/WIDERFACE/index.html
darknet的Github網站：https://github.com/pjreddie/darknet
[深度學習] 使用Darknet YOLO 模型破解中文驗證碼點擊識別: https://www.cnblogs.com/codefish/p/10104320.html
YOLO3 darknet訓練自己的數據: https://zhuanlan.zhihu.com/p/45852709

目標檢測初體驗（二）自制人臉檢測功能

數據集

如何使用darknet？

模型預測

總結

參考文章

Python之繪製個人足跡地圖

目標檢測初體驗（三）破解滑動驗證碼

NLP（三十一）短語的語序問題

NLP（三十）利用ALBERT和機器學習來做文本分類

目標檢測初體驗（二）自制人臉檢測功能

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結