基於Motion Vector的實時動作識別

論文：Real-time Action Recognition with Enhanced Motion Vector CNNs

Github: https://github.com/zbwglory/MV-release

2016 CVPR

論文基於雙流法（deep two-stream）的基本結構，提出了使用運動向量（Motion Vector）來代替光流（optical flow）。可以獲得比光流法更快的速度。但是單純的使用運動向量替換光流法會使得整體有7%精度的下降。對於精度降低的問題，提出了基於三步法（initialization transfer, supervision transfer ,their combination）的知識蒸餾，來使用光流的teacher網絡（OF-CNN）初始化，指導訓練student的運動向量網絡（MV-CNN）。

最終可以在UCF101數據集達到390.7FPS的速度，THUMOS14數據集達到403FPS的速度，相比傳統的雙流法加速27倍。

運動向量 VS 光流：

運動向量基於macro blocks輸出最終結果，而光流法是基於每一個pixel輸出結果。運動向量的噪聲更大，粒度更粗，光流法的粒度更細。運動向量的速度快，光流的速度很慢。

論文貢獻：

使用雙流模型，提出了基於CNN的識別方法，並且取得了state-of-the-art的效果。
首次提出使用運動向量來替代光流
提出了知識蒸餾的方法，大大的提升了準確性。

網絡結構：

並不是每一個圖片都會包含運動向量。假設圖片的集合爲group of pictures (GOP)，那麼一個GOP裏面包含3種類型的幀圖片，I-frame, P-frame and B-frame。I-frame是內部編碼的幀，不包含運動向量。P-frame表示預測的幀，包含運動向量。B-frame表示前後預測的幀，包含運動向量。

在實際的訓練中，不包含運動向量的I-frame會使得訓練精度降低，爲了解決這個問題，使用該幀的前面的包含運動向量的I-frame幀替代該幀的I-frame。

詳細的網絡結構如下，

知識蒸餾三部曲：

Teacher Initialization

Teacher網絡和student網絡的結構一樣，使用Teacher網絡的權值來初始化student網絡。

Supervision Transfer

教師網絡的輸入爲基於光流法得到的圖片，學生網絡的輸入爲基於運動向量得到的圖片。兩者的輸入不同，但是需要達到輸出一樣的效果。

對教師網絡的最後一個全連接層除以Temp，得到soft的輸出，學生網絡的最後一個全連接層也做同樣的操作。最後通過softmax_crossentrop loss來使得教師網絡和學生網絡的最後一個全連接層的特徵分佈相似。

另外一個loss就是學生網絡和groundtruth的softmax_crossentrop loss。

最終的loss就是上面兩個loss的加權和。

Combination

將Teacher Initialization和Supervision Transfer結合起來。兩個同時使用對學生網絡進行訓練。

實驗結果：

訓練中的數據增強：

隨機crop，224× 224, 196× 196 , 168× 168
隨機進行像素的尺度縮放scale jittering，1, 0.875, 0.75
隨機水平鏡像

程序指北：

CMakeLists：

cmake_minimum_required(VERSION 2.8)                                             

project(draw_flow)
set(CMAKE_CXX_FLAGS   "-std=c++11")

FIND_PACKAGE(OpenCV REQUIRED)

include_directories(${OpenCV_INCLUDE_DIRS})
include_directories("/usr/local/include/")

LINK_DIRECTORIES("/usr/local/lib")
add_executable(draw_flow draw_flow.cpp)
target_link_libraries(draw_flow ${OpenCV_LIBS})

mpegflow代碼：

#include <opencv2/highgui/highgui.hpp>                                                                                                                                                                             
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/imgcodecs/imgcodecs.hpp>
#include <opencv2/opencv.hpp>


#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace cv;
using namespace std;



static void convertFlowToImage(const Mat &flow_x, const Mat &flow_y, Mat &img_x, Mat &img_y,
       double lowerBound, double higherBound) {
    #define CAST(v, L, H) ((v) > (H) ? 255 : (v) < (L) ? 0 : cvRound(255*((v) - (L))/((H)-(L))))
    for (int i = 0; i < flow_x.rows; ++i) {
        for (int j = 0; j < flow_y.cols; ++j) {
            float x = flow_x.at<float>(i,j);
            float y = flow_y.at<float>(i,j);
            img_x.at<uchar>(i,j) = CAST(x, lowerBound, higherBound);
            img_y.at<uchar>(i,j) = CAST(y, lowerBound, higherBound);
        }
    }
    #undef CAST
}


int main(int argc, char** argv){
    // IO operation
    const char* keys =
        {                                                                                                                                                                                                          
            "{ f  | vidFile      | dump | filename of optical flow}"
            "{ x  | xFlowFile    | flow_x | filename of flow x component }"
            "{ y  | yFlowFile    | flow_y | filename of flow x component }"
            "{ b  | bound | 15 | specify the maximum of optical flow}"
        };



    //CommandLineParser cmd(argc, argv, keys);
    string vidFile = "dump.mvs0";//cmd.get<string>("vidFile");
    string xFlowFile = "v_ApplyEyeMakeup_g01_c01/flow_x";//cmd.get<string>("xFlowFile");
    string yFlowFile = "v_ApplyEyeMakeup_g01_c01/flow_y"; //cmd.get<string>("yFlowFile");
    //string imgFile = cmd.get<string>("imgFile");
    int bound = 20;//cmd.get<int>("bound");

    int video_width = 320;
    int video_height = 240;

    int frame_num = 0;
    Mat image, prev_image, prev_grey, grey, frame;

    ifstream fin;
    cout << vidFile << endl;
    fin.open(vidFile.data());
    if (!fin) {
        cout << "error in opening file";
        return -1;
    }


    int frame_prev = 0;
    while(!fin.eof()) {
        // Output optical flow
        int mv_per_frame = -1;
        fin >> mv_per_frame;
        if (mv_per_frame == -1)
            break;
        int forback, blockx,blocky,srcx,srcy,dstx,dsty,minx,miny;
        Mat flow_x(video_height,video_width,CV_32F,Scalar(0));
        Mat flow_y(video_height,video_width,CV_32F,Scalar(0));
        for (int i=0; i<mv_per_frame; i++) {
            fin >> frame_num >> forback >> blockx >> blocky >> srcx >> srcy >> dstx >> dsty >> minx >> miny;
            for (int x=0; x<blockx; x++) {
                for (int y=0; y<blocky; y++) {
                    if ((dstx-blockx/2+x < 0) || (dsty-blocky/2+y < 0) || (dstx-blockx/2+x > video_width-1) || (dsty-blocky/2+y > video_height-1) || (forback > 0))
                        continue;
                    flow_x.at<float>(dsty-blocky/2+y,dstx-blockx/2+x) = (float)minx;
                    flow_y.at<float>(dsty-blocky/2+y,dstx-blockx/2+x) = (float)miny;
                }
            }
        }
        frame_num = frame_num-1;

        cv::Mat imgX(flow_x.size(),CV_8UC1);
        cv::Mat imgY(flow_y.size(),CV_8UC1);
        convertFlowToImage(flow_x,flow_y, imgX, imgY, -bound, bound);
        char tmp[20];
        sprintf(tmp,"_%04d.jpg",int(frame_num));

        cv::Mat imgX_, imgY_, imgX_small, imgY_small;
        cv::resize(imgX,imgX_, cv::Size(340,256));
        cv::resize(imgY,imgY_, cv::Size(340,256));

        cv::imwrite(xFlowFile + tmp,imgX_);
        cv::imwrite(yFlowFile + tmp,imgY_);


        while (frame_prev < frame_num-1) {
            frame_prev ++ ;
            char tmp1[20];
            sprintf(tmp1,"_%04d.jpg",int(frame_prev));
            cout << tmp1 << endl;
            cv::imwrite(xFlowFile + tmp1,imgX_);
            cv::imwrite(yFlowFile + tmp1,imgY_);
        }
        frame_prev = frame_num;

    }
    return 0;
}

提取motion vector並轉化爲灰度圖片指令extract_mvs_sample.sh：

tar xvf ffmpeg-2.7.2.tar
mkdir v_ApplyEyeMakeup_g01_c01

gcc -o ffmpeg-2.7.2/doc/examples/extract_mvs extract_mvs.c -L /usr/local/lib/ -lavcodec -lavdevice -lavfilter -lavformat –lavutil

ffmpeg-2.7.2/doc/examples/extract_mvs v_ApplyEyeMakeup_g01_c01.avi > dump.mvs0

./MV-code-release/build/draw_flow -f dump.mvs0 -x v_ApplyEyeMakeup_g01_c01/flow_x -y v_ApplyEyeMakeup_g01_c01/flow_y -b 20

dump.mvs0的 樣子：

運動向量的圖片形式：

基於Motion Vector的實時動作識別

人臉檢測之RetinaFace

臉型匹配

人臉檢測之CenterFace

基於人臉先驗的人臉超分FSRNet

人臉美顏磨皮Dermabrasion

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結