C++實現DPM/LatentSVM 完整代碼下載 --- 第四篇

    這篇文章的目的是解釋一下FastDPM的工作流程。。。


    有些人對我公佈的FastDPM代碼(見其它幾篇博文)有興趣,想讀這個代碼,所以給我發郵件詢問工作原理的,我曾單獨給了郵件回答,這裏把其中一個往來郵件貼在這裏。

    話說這個貌似是個老外,用英語寫的郵件。。。

1.來自Maxwell的詢問郵件:

<span style="font-family:Arial;font-size:18px;">Hi yuxianguo,

My name is Maxwell i appreciate a lot the tool that you implemented
"Fast DPM" Now i'm working in person identification and the latentsvm
algorithm is one of the stuff that i should deal, i don't want your
source code but if you can give some orientation in terms of workflow
or specifications to implement my own latentSVM in C/C++

I'm student in Computer science: Computer Vision

Cordially
Maxwell
skype: ***********</span>

2.我的回覆:

<span style="font-family:Arial;font-size:18px;">Hi Maxwell,

Well, it has been a long time since I finished my Fast-DPM. Now I can only tell you something I still have in my memory. Maybe right, or wrong.
The DPM works well especially for pedestrian detection. It uses a root template and several part templates to detect whole objet and object parts simultaneously. Cues from root response and parts response are aggregated to give final proposals. The workflow of detection algorithm is like this:
(1) Features: It calculates a feature pyramid of image.
(2) Filtering: It slides all templates (root & parts) in the feature pyramid to get the response of every template at every position. This is the most time-consuming part, takes more than 90% of total time.
(3) Integration: It integrates root response and part responses using two rules -- the deformation rule and the structural rule. Though the grammar model makes it difficult to understand this. It's acturally very simple. The structural rule say that if there is an object at (x,y), then there should be a part at (x+delta_x,y+delta_y). So we add root response at (x,y) and the corresponding part responses. In DPM training, it is assumed that object parts might be better detected at the 2 times resolution of  object. Then the deformation rule say that if part should be at  (x+delta_x,y+delta_y), then we'd better try neighboring positions of  (x+delta_x,y+delta_y) because the object may deform.
(4) NMS: after integration, we get a final score map, each score represents the likelihood of a object be located at that position. In practice, we got a pyrmid of scores to search for different scales. The non-maximum-suppression is used to select proposals in the score maps.
(5) It should be noticed that to capture different poses of objects, the model training stage splits samples into several sub-categories, each representing one pose. For each sub-category, a DPM model is trained then. Thus we have several DPMs in a model file, each is called a component model. In the detection procedure, all component models are used to find object proposals separately. And for every position, its object proposal (score) is selected as the maximum across all components.

--
YU</span>

    關於DPM或者我的實現代碼,其工作流程大概就如郵件中所述,多看幾遍其實會越來越覺得很簡單。DPM的精華在於訓練時用了LatentSVM來挖掘最佳的part-representation,其檢測部分沒太多花哨的。

     順便所依據,有些網友給我發的郵件看起來像QQ聊天,沒有擡頭也沒有署名,也不分段落,看起來很不友好。不會寫郵件的可以向上面那位MaxWell同學學習一下,擡頭至少要有的。

     沒了...

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章