【代碼分析】Pytorch版YOLO V4代碼分析

YOLO V4出來也幾天了,論文大致看了下,然後看到大量的優秀者實現了各個版本的YOLOV4了。

Yolo v4 論文: https://arxiv.org/abs/2004.10934

AB大神Darknet版本的源碼實現: https://github.com/AlexeyAB/darknet

本文針對Pytorch版本實現的YOLOV4進行分析,感謝Tianxiaomo 分享的工程:Pytorch-YoloV4


作者分享的權重文件,下載地址:

該權重文件yolov4.weights 是在coco數據集上訓練的,目標類有80種,當前工程支持推理,不包括訓練~

我的測試環境是anaconda配置的環境,pytorch1.0.1, torchvision 0.2.1;


工程目錄如下:

終端運行指令:

# 指令需要傳入cfg文件路徑,權重文件路徑,圖像路徑
>>python demo.py cfg/yolov4.cfg yolov4.weights data/dog.jpg

運行結果會生成一張檢測後的圖:predictions.jpg

接下來對源碼做分析:

其中demo.py中,主要調用了函數detect(),其代碼如下:

def detect(cfgfile, weightfile, imgfile):
    m = Darknet(cfgfile)  #穿件Darknet模型對象m

    m.print_network()    # 打印網絡結構
    m.load_weights(weightfile)  #加載權重值
    print('Loading weights from %s... Done!' % (weightfile))

    num_classes = 80
    if num_classes == 20:
        namesfile = 'data/voc.names'
    elif num_classes == 80:
        namesfile = 'data/coco.names'
    else:
        namesfile = 'data/names'

    use_cuda = 0  # 是否使用cuda,工程使用的是cpu執行
    if use_cuda:
        m.cuda()   # 如果使用cuda則將模型對象拷貝至顯存,默認GUP ID爲0;

    img = Image.open(imgfile).convert('RGB') # PIL打開圖像
    sized = img.resize((m.width, m.height))

    for i in range(2):
        start = time.time()
        boxes = do_detect(m, sized, 0.5, 0.4, use_cuda)  # 做檢測,返回的boxes是昨晚nms後的檢測框;
        finish = time.time()
        if i == 1:
            print('%s: Predicted in %f seconds.' % (imgfile, (finish - start)))

    class_names = load_class_names(namesfile)   # 加載類別名
    plot_boxes(img, boxes, 'predictions.jpg', class_names)# 畫框,並輸出檢測結果圖像文件;

在創建Darknet()對象過程中,會根據傳入的cfg文件做初始化工作,主要是cfg文件的解析,提取cfg中的每個block;網絡結構的構建;(如下圖)


 現在先說下根據cfg文件是如何解析網絡結果吧,主要調用了tool/cfg.py的parse_cfg()函數,它會返回blocks,網絡結果是長這個樣子的(使用Netron網絡查看工具 打開cfg文件,完整版請自行嘗試):


創建網絡模型是調用了darknet2pytorch.py中的create_network()函數,它會根據解析cfg得到的blocks構建網絡,先創建個ModuleList模型列表,爲每個block創建個Sequential(),將每個block中的卷積操作,BN操作,激活操作都放到這個Sequential()中;可以理解爲每個block對應一個Sequential();

構建好的的ModuleList模型列表大致結構如下:

Darknet(
  (models): ModuleList(
    (0): Sequential(
      (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish1): Mish()
    )
    (1): Sequential(
      (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish2): Mish()
    )
    (2): Sequential(
      (conv3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish3): Mish()
    )
    (3): EmptyModule()
    (4): Sequential(
      (conv4): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish4): Mish()
    )
    (5): Sequential(
      (conv5): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish5): Mish()
    )
    (6): Sequential(
      (conv6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish6): Mish()
    )
    (7): EmptyModule()
    (8): Sequential(
      (conv7): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish7): Mish()
    )
    (9): EmptyModule()
    (10): Sequential(
      (conv8): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish8): Mish()
    )
    (11): Sequential(
      (conv9): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn9): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish9): Mish()
    )
    (12): Sequential(
      (conv10): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish10): Mish()
    )
    (13): EmptyModule()
    (14): Sequential(
      (conv11): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn11): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish11): Mish()
    )
    (15): Sequential(
      (conv12): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish12): Mish()
    )
    (16): Sequential(
      (conv13): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn13): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish13): Mish()
    )
    (17): EmptyModule()
    (18): Sequential(
      (conv14): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish14): Mish()
    )
    (19): Sequential(
      (conv15): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn15): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish15): Mish()
    )
    (20): EmptyModule()
    (21): Sequential(
      (conv16): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn16): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish16): Mish()
    )
    (22): EmptyModule()
    (23): Sequential(
      (conv17): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn17): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish17): Mish()
    )
    (24): Sequential(
      (conv18): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish18): Mish()
    )
    (25): Sequential(
      (conv19): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn19): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish19): Mish()
    )
    (26): EmptyModule()
    (27): Sequential(
      (conv20): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn20): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish20): Mish()
    )
    (28): Sequential(
      (conv21): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn21): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish21): Mish()
    )
    (29): Sequential(
      (conv22): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn22): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish22): Mish()
    )
    (30): EmptyModule()
    (31): Sequential(
      (conv23): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn23): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish23): Mish()
    )
    (32): Sequential(
      (conv24): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn24): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish24): Mish()
    )
    (33): EmptyModule()
    (34): Sequential(
      (conv25): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn25): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish25): Mish()
    )
    (35): Sequential(
      (conv26): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn26): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish26): Mish()
    )
    (36): EmptyModule()
    (37): Sequential(
      (conv27): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn27): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish27): Mish()
    )
    (38): Sequential(
      (conv28): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn28): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish28): Mish()
    )
    (39): EmptyModule()
    (40): Sequential(
      (conv29): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn29): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish29): Mish()
    )
    (41): Sequential(
      (conv30): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn30): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish30): Mish()
    )
    (42): EmptyModule()
    (43): Sequential(
      (conv31): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn31): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish31): Mish()
    )
    (44): Sequential(
      (conv32): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn32): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish32): Mish()
    )
    (45): EmptyModule()
    (46): Sequential(
      (conv33): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn33): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish33): Mish()
    )
    (47): Sequential(
      (conv34): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn34): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish34): Mish()
    )
    (48): EmptyModule()
    (49): Sequential(
      (conv35): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn35): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish35): Mish()
    )
    (50): Sequential(
      (conv36): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn36): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish36): Mish()
    )
    (51): EmptyModule()
    (52): Sequential(
      (conv37): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn37): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish37): Mish()
    )
    (53): EmptyModule()
    (54): Sequential(
      (conv38): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn38): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish38): Mish()
    )
    (55): Sequential(
      (conv39): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn39): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish39): Mish()
    )
    (56): Sequential(
      (conv40): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn40): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish40): Mish()
    )
    (57): EmptyModule()
    (58): Sequential(
      (conv41): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn41): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish41): Mish()
    )
    (59): Sequential(
      (conv42): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn42): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish42): Mish()
    )
    (60): Sequential(
      (conv43): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn43): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish43): Mish()
    )
    (61): EmptyModule()
    (62): Sequential(
      (conv44): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn44): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish44): Mish()
    )
    (63): Sequential(
      (conv45): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn45): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish45): Mish()
    )
    (64): EmptyModule()
    (65): Sequential(
      (conv46): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn46): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish46): Mish()
    )
    (66): Sequential(
      (conv47): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn47): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish47): Mish()
    )
    (67): EmptyModule()
    (68): Sequential(
      (conv48): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn48): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish48): Mish()
    )
    (69): Sequential(
      (conv49): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn49): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish49): Mish()
    )
    (70): EmptyModule()
    (71): Sequential(
      (conv50): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn50): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish50): Mish()
    )
    (72): Sequential(
      (conv51): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn51): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish51): Mish()
    )
    (73): EmptyModule()
    (74): Sequential(
      (conv52): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn52): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish52): Mish()
    )
    (75): Sequential(
      (conv53): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn53): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish53): Mish()
    )
    (76): EmptyModule()
    (77): Sequential(
      (conv54): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn54): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish54): Mish()
    )
    (78): Sequential(
      (conv55): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn55): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish55): Mish()
    )
    (79): EmptyModule()
    (80): Sequential(
      (conv56): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn56): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish56): Mish()
    )
    (81): Sequential(
      (conv57): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn57): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish57): Mish()
    )
    (82): EmptyModule()
    (83): Sequential(
      (conv58): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn58): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish58): Mish()
    )
    (84): EmptyModule()
    (85): Sequential(
      (conv59): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn59): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish59): Mish()
    )
    (86): Sequential(
      (conv60): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn60): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish60): Mish()
    )
    (87): Sequential(
      (conv61): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn61): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish61): Mish()
    )
    (88): EmptyModule()
    (89): Sequential(
      (conv62): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn62): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish62): Mish()
    )
    (90): Sequential(
      (conv63): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn63): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish63): Mish()
    )
    (91): Sequential(
      (conv64): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn64): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish64): Mish()
    )
    (92): EmptyModule()
    (93): Sequential(
      (conv65): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn65): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish65): Mish()
    )
    (94): Sequential(
      (conv66): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn66): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish66): Mish()
    )
    (95): EmptyModule()
    (96): Sequential(
      (conv67): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn67): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish67): Mish()
    )
    (97): Sequential(
      (conv68): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn68): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish68): Mish()
    )
    (98): EmptyModule()
    (99): Sequential(
      (conv69): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn69): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish69): Mish()
    )
    (100): Sequential(
      (conv70): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn70): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish70): Mish()
    )
    (101): EmptyModule()
    (102): Sequential(
      (conv71): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn71): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish71): Mish()
    )
    (103): EmptyModule()
    (104): Sequential(
      (conv72): Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn72): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (mish72): Mish()
    )
    (105): Sequential(
      (conv73): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn73): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky73): LeakyReLU(negative_slope=0.1, inplace)
    )
    (106): Sequential(
      (conv74): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn74): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky74): LeakyReLU(negative_slope=0.1, inplace)
    )
    (107): Sequential(
      (conv75): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn75): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky75): LeakyReLU(negative_slope=0.1, inplace)
    )
    (108): MaxPoolStride1()
    (109): EmptyModule()
    (110): MaxPoolStride1()
    (111): EmptyModule()
    (112): MaxPoolStride1()
    (113): EmptyModule()
    (114): Sequential(
      (conv76): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn76): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky76): LeakyReLU(negative_slope=0.1, inplace)
    )
    (115): Sequential(
      (conv77): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn77): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky77): LeakyReLU(negative_slope=0.1, inplace)
    )
    (116): Sequential(
      (conv78): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn78): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky78): LeakyReLU(negative_slope=0.1, inplace)
    )
    (117): Sequential(
      (conv79): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn79): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky79): LeakyReLU(negative_slope=0.1, inplace)
    )
    (118): Upsample()
    (119): EmptyModule()
    (120): Sequential(
      (conv80): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn80): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky80): LeakyReLU(negative_slope=0.1, inplace)
    )
    (121): EmptyModule()
    (122): Sequential(
      (conv81): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn81): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky81): LeakyReLU(negative_slope=0.1, inplace)
    )
    (123): Sequential(
      (conv82): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn82): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky82): LeakyReLU(negative_slope=0.1, inplace)
    )
    (124): Sequential(
      (conv83): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn83): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky83): LeakyReLU(negative_slope=0.1, inplace)
    )
    (125): Sequential(
      (conv84): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn84): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky84): LeakyReLU(negative_slope=0.1, inplace)
    )
    (126): Sequential(
      (conv85): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn85): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky85): LeakyReLU(negative_slope=0.1, inplace)
    )
    (127): Sequential(
      (conv86): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn86): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky86): LeakyReLU(negative_slope=0.1, inplace)
    )
    (128): Upsample()
    (129): EmptyModule()
    (130): Sequential(
      (conv87): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn87): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky87): LeakyReLU(negative_slope=0.1, inplace)
    )
    (131): EmptyModule()
    (132): Sequential(
      (conv88): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn88): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky88): LeakyReLU(negative_slope=0.1, inplace)
    )
    (133): Sequential(
      (conv89): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn89): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky89): LeakyReLU(negative_slope=0.1, inplace)
    )
    (134): Sequential(
      (conv90): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn90): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky90): LeakyReLU(negative_slope=0.1, inplace)
    )
    (135): Sequential(
      (conv91): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn91): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky91): LeakyReLU(negative_slope=0.1, inplace)
    )
    (136): Sequential(
      (conv92): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn92): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky92): LeakyReLU(negative_slope=0.1, inplace)
    )
    (137): Sequential(
      (conv93): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn93): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky93): LeakyReLU(negative_slope=0.1, inplace)
    )
    (138): Sequential(
      (conv94): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
    )
    (139): YoloLayer()
    (140): EmptyModule()
    (141): Sequential(
      (conv95): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn95): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky95): LeakyReLU(negative_slope=0.1, inplace)
    )
    (142): EmptyModule()
    (143): Sequential(
      (conv96): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn96): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky96): LeakyReLU(negative_slope=0.1, inplace)
    )
    (144): Sequential(
      (conv97): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn97): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky97): LeakyReLU(negative_slope=0.1, inplace)
    )
    (145): Sequential(
      (conv98): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn98): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky98): LeakyReLU(negative_slope=0.1, inplace)
    )
    (146): Sequential(
      (conv99): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn99): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky99): LeakyReLU(negative_slope=0.1, inplace)
    )
    (147): Sequential(
      (conv100): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn100): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky100): LeakyReLU(negative_slope=0.1, inplace)
    )
    (148): Sequential(
      (conv101): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn101): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky101): LeakyReLU(negative_slope=0.1, inplace)
    )
    (149): Sequential(
      (conv102): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
    )
    (150): YoloLayer()
    (151): EmptyModule()
    (152): Sequential(
      (conv103): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn103): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky103): LeakyReLU(negative_slope=0.1, inplace)
    )
    (153): EmptyModule()
    (154): Sequential(
      (conv104): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn104): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky104): LeakyReLU(negative_slope=0.1, inplace)
    )
    (155): Sequential(
      (conv105): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn105): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky105): LeakyReLU(negative_slope=0.1, inplace)
    )
    (156): Sequential(
      (conv106): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn106): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky106): LeakyReLU(negative_slope=0.1, inplace)
    )
    (157): Sequential(
      (conv107): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn107): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky107): LeakyReLU(negative_slope=0.1, inplace)
    )
    (158): Sequential(
      (conv108): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn108): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky108): LeakyReLU(negative_slope=0.1, inplace)
    )
    (159): Sequential(
      (conv109): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn109): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky109): LeakyReLU(negative_slope=0.1, inplace)
    )
    (160): Sequential(
      (conv110): Conv2d(1024, 255, kernel_size=(1, 1), stride=(1, 1))
    )
    (161): YoloLayer()
  )
)

返回demo.py 的detect()函數,構件好Darknet對象後,打印網絡結構圖,然後調用darknet2pytorch.py中的load_weights()加載權重文件,這裏介紹下這個權重文件中的數值分別是什麼以及怎麼排序的。

對於沒有bias的模型數據,從yolov4.weights加載的模型數據,其數值排列順序爲先是BNbiasgamma),然後是BNweightalpha)值,然後是BNmean,然後是BNvar, 最後是卷積操作的權重值,如下圖,buf是加載後的yolov4.weights數據內容;網絡第一個卷積核個數爲32個,其對應的BN2操作的bias也有32個,而卷積核參數爲3x3x3x32 =864 (含義分別是輸入通道是3,因爲圖像是三通道的,3x3的卷積核大小,然後輸出核個數是32)

 

 而如下幾個block類型在訓練過程中是不會生成權重值的,所以不用從yolov4.weights中取值;

 elif block['type'] == 'maxpool':
                pass
            elif block['type'] == 'reorg':
                pass
            elif block['type'] == 'upsample':
                pass
            elif block['type'] == 'route':
                pass
            elif block['type'] == 'shortcut':
                pass
            elif block['type'] == 'region':
                pass
            elif block['type'] == 'yolo':
                pass
            elif block['type'] == 'avgpool':
                pass
            elif block['type'] == 'softmax':
                pass
            elif block['type'] == 'cost':
                pass

完成cfg文件的解析,模型的創建與權重文件的加載之後,現在要做的就是執行檢測操作了,主要調用了utils/utils.py中的do_detect()函數,在demo.py中就是這行代碼:boxes = do_detect(m, sized, 0.5, 0.4, use_cuda)

def do_detect(model, img, conf_thresh, nms_thresh, use_cuda=1):
    model.eval()  #模型做推理
    t0 = time.time()

    if isinstance(img, Image.Image):
        width = img.width
        height = img.height
        img = torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes()))
        img = img.view(height, width, 3).transpose(0, 1).transpose(0, 2).contiguous() # CxHxW
        img = img.view(1, 3, height, width)  # 對圖像維度做變換,BxCxHxW
        img = img.float().div(255.0)         # [0-255] --> [0-1]
    elif type(img) == np.ndarray and len(img.shape) == 3:  # cv2 image
        img = torch.from_numpy(img.transpose(2, 0, 1)).float().div(255.0).unsqueeze(0)
    elif type(img) == np.ndarray and len(img.shape) == 4:
        img = torch.from_numpy(img.transpose(0, 3, 1, 2)).float().div(255.0)
    else:
        print("unknow image type")
        exit(-1)

    if use_cuda:
        img = img.cuda()
    img = torch.autograd.Variable(img)


    list_boxes = model(img)  # 主要是調用了模型的forward操作,返回三個yolo層的輸出

    anchors = [12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401]
    num_anchors = 9  # 3個yolo層共9種錨點
    anchor_masks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    strides = [8, 16, 32]   # 每個yolo層相對輸入圖像尺寸的減少倍數分別爲8,16,32
    anchor_step = len(anchors) // num_anchors
    boxes = []
    for i in range(3):
        masked_anchors = []
        for m in anchor_masks[i]:
            masked_anchors += anchors[m * anchor_step:(m + 1) * anchor_step]
        masked_anchors = [anchor / strides[i] for anchor in masked_anchors]
        boxes.append(get_region_boxes1(list_boxes[i].data.numpy(), 0.6, 80, masked_anchors, len(anchor_masks[i])))
        # boxes.append(get_region_boxes(list_boxes[i], 0.6, 80, masked_anchors, len(anchor_masks[i])))
    if img.shape[0] > 1:
        bboxs_for_imgs = [
            boxes[0][index] + boxes[1][index] + boxes[2][index]
            for index in range(img.shape[0])]
        # 分別對每一張圖像做nms
        boxes = [nms(bboxs, nms_thresh) for bboxs in bboxs_for_imgs]
    else:
        boxes = boxes[0][0] + boxes[1][0] + boxes[2][0]
        boxes = nms(boxes, nms_thresh)

    return boxes   # 返回nms後的boxes

模型forward後輸出結果存在list_boxes中,因爲有3yolo輸出層,所以這個列表list_boxes中又分爲3個子列表;

其中list_boxes[0]中存放的是第一個yolo層輸出,其特徵圖大小對於原圖縮放尺寸爲8,即strides[0], 對於608x608圖像來說,該層的featuremap尺寸爲608/8=76;則該層的yolo輸出數據維度爲[batch, (classnum+4+1)*num_anchors, feature_h, feature_w] , 對於80類的coco來說,測試圖像爲1,每個yolo層每個特徵圖像點有3個錨點,該yolo層輸出是[1,255,76,76];對應錨點大小爲[1.5,2.0,2.375,4.5,5.0,3.5]; (6個數分別是3個錨點的wh,按照w1,h1,w2,h2,w3,h3排列);

同理第二個yolo層檢測結果維度爲[1,255,38,38],對應錨點大小爲:[2.25,4.6875,4.75,3.4375,4.5,9.125],輸出爲 [1,255,38,38]

第三yolo層檢測維度爲[1,255,19,19],對應錨點大小爲:[4.4375,3.4375,6.0,7.59375,14.34375,12.53125]輸出爲 [1,255,19,19];


do_detect()函數中主要是調用了get_region_boxes1(output, conf_thresh, num_classes, anchors, num_anchors, only_objectness=1, validation=False) 這個函數對forward後的output做解析並做nms操作;

每個yolo層輸出數據分析,對於第一個yolo層,輸出維度爲[1,85*3,76,76 ]; 會將其reshape[85, 1*3*76*76],即有1*3*76*76個錨點在預測,每個錨點預測信息有80個類別的概率和4個位置信息和1個是否包含目標的置信度;下圖是第一個yolo輸出層的數據(實際繪製網格數量不正確,此處只是做說明用

 每個輸出的對應代碼實現爲:

繼續結合上面的圖,分析對於某一個yolo層輸出的數據是怎麼排列的,其示意圖如下:

 

 如果置信度滿足閾值要求,則將預測的box保存到列表(其中id是所有output的索引,其值在0~batch*anchor_num*h*w範圍內)

 

if conf > conf_thresh:
   bcx = xs[ind]
   bcy = ys[ind]
   bw = ws[ind]
   bh = hs[ind]
   cls_max_conf = cls_max_confs[ind]
   cls_max_id = cls_max_ids[ind]
   box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]

對於3個yolo層先是簡單的對每個yolo層輸出中是否含有目標做了過濾(含有目標的概率大於閾值);然後就是對三個過濾後的框合併到一個list中作NMS操作了;涉及的代碼如下:

def nms(boxes, nms_thresh):
    if len(boxes) == 0:
        return boxes

    det_confs = torch.zeros(len(boxes))
    for i in range(len(boxes)):
        det_confs[i] = 1 - boxes[i][4]

    _, sortIds = torch.sort(det_confs)  # sort是按照從小到大排序,那麼sortlds中是按照有目標的概率由大到小排序
    out_boxes = []
    for i in range(len(boxes)):
        box_i = boxes[sortIds[i]]
        if box_i[4] > 0:
            out_boxes.append(box_i)   # 取出有目標的概率最大的box放入out_boxes中;
            for j in range(i + 1, len(boxes)):  #然後將剩下的box_j都和這個box_i進行IOU計算,若與box_i重疊率大於閾值,則將box_j的包含目標概率值置爲0(即不選它)
                box_j = boxes[sortIds[j]]
                if bbox_iou(box_i, box_j, x1y1x2y2=False) > nms_thresh:
                    # print(box_i, box_j, bbox_iou(box_i, box_j, x1y1x2y2=False))
                    box_j[4] = 0
    return out_boxes

補充:

論文中提到的mish激活函數

公式是這樣的(其中x是輸入)

對應的圖是:

 

##Pytorch中的代碼實現爲:
class Mish(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        x = x * (torch.tanh(torch.nn.functional.softplus(x)))
        return x

#--------------------------------------------------------------#
Tensorflow的代碼實現爲:
import tensorflow as tf
from tensorflow.keras.layers import Activation
from tensorflow.keras.utils import get_custom_objects
class Mish(Activation):    
    def __init__(self, activation, **kwargs):        
        super(Mish, self).__init__(activation, **kwargs)        
        self.__name__ = 'Mish'
def mish(inputs):
    return inputs * tf.math.tanh(tf.math.softplus(inputs))
get_custom_objects().update({'Mish': Mish(mish)})

#使用方法
x = Activation('Mish')(x)

文中提到的SPP結構大致是:

 

Pytorch指定運行的GPUID號的方法,https://www.cnblogs.com/jfdwd/p/11434332.html 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章