SlowFastNet(SlowFast) finetune(微調)

SlowFastNet github(最近放出來的):
https://github.com/facebookresearch/SlowFast

配置環境要求:
https://github.com/facebookresearch/SlowFast/blob/master/INSTALL.md
這裏的兩個包PyAv和fvcore比較不好裝;
fvcore的github上推薦的是使用pip install ‘git+https://github.com/facebookresearch/fvcore’,但由於加密系統的問題,git用不了;所以只能下載下來,解壓後進入文件夾使用python setup.py install指令通過編譯來安裝;
PyAv使用推薦的conda install av -c conda-forge出現了段錯誤,段錯誤解決具體參考我另一篇https://blog.csdn.net/weixin_42388228/article/details/102882607;同樣這裏也可以使用先下載下來再python setup.py install來安裝,這樣安裝會報錯誤,查了下PyAv github裏issue列表是因爲缺少一些依賴,具體參考我另一篇https://blog.csdn.net/weixin_42388228/article/details/102817959;
這樣安裝就完事了

權重文件:
https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md
我下載的kinetics中的倒數第三個SLOWFAST_8x8_R50,最後兩個暫時還沒提供

坑1:
使用權重文件對應的yaml文件時,是用…/SlowFast-master/configs/Kinetics/SLOWFAST_8x8_R50.yaml來配置config,還需參考…/SlowFast-master/configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml文件更改第一個yaml文件。(有一處改動,kernel_size)

坑2:
slowfast的輸入爲一個list,list第一個元素的shape爲[batch_size,3,8,224,224],第二個元素的shape爲[batch_size,3,32,224,224]

坑3:
我這的實現在yaml文件中修改了gpu個數爲1

import sys
sys.path.append('.../SlowFast-master/slowfast/config/')
sys.path.append('.../SlowFast-master/slowfast/models/')
sys.path.append('.../SlowFast-master/slowfast/utils/')
import slowfast.models.optimizer as optim
import slowfast.utils.checkpoint as cu
from defaults import _C
from model_builder import _MODEL_TYPES
from slowfast.models import model_builder
from slowfast.utils.c2_model_loading import get_name_convert_func
import torch
import torch.nn as nn
import os
import cv2
import numpy as np
import pickle
import yaml
torch.cuda.set_device(7)
###########################################         data preparation         ###########################################
data1,label=data4file(batch_size=32,stride=70)
data2,_=data4file(batch_size=8,stride=70)
data1=torch.from_numpy(data1).float()
data2=torch.from_numpy(data2).float()
label=torch.from_numpy(label).long()
###########################################     customized config file       ###########################################
f1=open('.../SlowFast-master/configs/Kinetics/SLOWFAST_8x8_R50.yaml')
d1=yaml.load(f1)
for i in d1.keys():
    if not isinstance(d1[i],dict):
        _C[i]=d1[i]
    else:
        for j in d1[i].keys():
            _C[i][j]=d1[i][j]
################################################     model finetune     ################################################
model=model.builder.build_model(_C)
print('Model built.')
# print(*list(model.children())[-1:])
optimizer = optim.construct_optimizer(model, _C)
cu.load_checkpoint('.../SlowFast-master/SLOWFAST_8x8_R50.pkl', model, data_parallel=False, optimizer=optimizer, inflation=False, convert_from_caffe2=True,)
print('Model loaded.')
num_pairs=len(data1)
for epoch in range(10):
    indicies = list(range(num_pairs))
    np.random.shuffle(indicies)
    for j in np.arange(num_pairs):
        images = [data2[indicies[j]].reshape(1,3,8,224,224).cuda(non_blocking=True),data1[indicies[j]].reshape(1,3,32,224,224).cuda(non_blocking=True)]
        labels = label[indicies[j]].reshape(1).cuda()

        # Forward pass
        preds = model(images)
        loss = nn.CrossEntropyLoss(reduction="mean")(preds, labels)
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print('success')
    if i==0:
        torch.save(model.state_dict(),'.../slowfast_weight.pkl')
        

----------------------------------------------------------2019.11.14更新----------------------------------------------------------------
更新主要是在slowfast使用多GPU訓練,SlowFast-master中的model_builder.py文件如果用在多GPU是有問題的,作者沒有寫完整,所以會產生下面我這篇博客的問題
https://blog.csdn.net/weixin_42388228/article/details/103067973
具體更改在model_builder.py的build_model函數中,具體更改如下(自己改的,可能改的比較簡單)

"""Model construction functions."""

import torch

from slowfast.models.video_model_builder import ResNetModel, SlowFastModel
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1,2'
device = torch.device('cuda:0')
_MODEL_TYPES = {
    "slowfast": SlowFastModel,
    "slowonly": ResNetModel,
    "c2d": ResNetModel,
    "i3d": ResNetModel,
}

def build_model(cfg):
    assert (
        cfg.MODEL.ARCH in _MODEL_TYPES.keys()
    ), "Model type '{}' not supported".format(cfg.MODEL.ARCH)
    assert (
        cfg.NUM_GPUS <= torch.cuda.device_count()
    ), "Cannot use more GPU devices than available"
    model = _MODEL_TYPES[cfg.MODEL.ARCH](cfg)
    if cfg.NUM_GPUS > 1:
        torch.distributed.init_process_group('nccl',init_method='file:///home/.../my_file',world_size=1,rank=0)
        model = torch.nn.parallel.DistributedDataParallel(module=model.to(device),find_unused_parameters=True)
    return model

改的比較多,所以最好再備用一個原始的model_builder.py文件用於其他情況,比如說最基本的單GPU訓練或多機多卡分佈式訓練。
DistributedDataParallel函數參數意義參考:
https://github.com/pytorch/examples/tree/master/imagenet

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章