Introduction

在目標檢測中，從圖片上預測出一些region proposals，這些region proposals會與預先設置好的anchors進行匹配，匹配的方式是nms，超過給定IoU閾值就匹配，否則不匹配。這種通過IoU指標進行匹配有缺點：在空間上對齊的region proposal，從它提取出來的特徵不一定能夠很好地預測object的類別和位置。論文給出了例子，對於長條形狀的物體，非中心化的物體，比如牙刷，空間上對齊的region proposal包含許多背景信息，導致特徵的表徵能力變弱，而牙刷的最具表徵的部位是牙刷頭部。針對這個問題，有許多論文研究嘗試去掉anchor，這些方法稱爲anchor free方法。論文方法沒有去掉anchor，而是能夠讓物體靈活地匹配anchor，從而能夠學習到對分類和定位最具有表徵能力的特徵。

什麼樣的匹配方式是最好的呢？首先保證算法有高的召回率。檢測器要保證對於每個object，至少有一個anchor對應的預測接近gt。或者說每個gt對應的anchor都要有一個proposal與之匹配。第二，算法要有高的預測精度。檢測器需要把定位錯誤的proposal分類成背景類。定位錯誤的proposal預測的結果一般預測的不準，把false position去掉，能夠提高預測精度。第三，anchor的匹配預測要與最後一步NMS兼容，例如，分類分數越高，定位要越準確。否則，定位準確但分類分數低的proposal會在nms中拋棄掉。

論文把object-anchor匹配問題定義成最大似然估計問題。具體的描述請看下文。

Detector Training as Maximum Likelihood Estimation

檢測問題可以設計成極大似然估計問題，具體如下。

以一階CNN-based目標檢測算法爲例。給定一張圖片 $I$ ，gt是 $B$ ，其中一個gt box $b_i \in B$ ，對應類別 $b_i^{cls}$ 和位置 $b_i^{loc}$ 。在網絡中，每個anchor $a_j \in A$ 有類別預測 $a_j^{cls} \in \mathcal{R}^k$ 和位置預測 $a_j^{loc} \in \mathcal{R}^4$ ，k表示有k個分類類別。

在訓練過程，nms方法會把anchors與objects對齊，如下圖

有一個矩陣 $C_{ij} \in \{0, 1\}$ ，定義object $b_i$ 是否匹配anchor $a_j$ 。當 $b_i$ 和 $a_j$ 的IoU大於一個閾值時， $b_i$ 和 $b_j$ 匹配， $C_{ij}=1$ ，否則 $C_{ij}=0$ 。特別地，當多個object的IoU大於閾值時，有最大IoU的object匹配這個anchor，保證每個anchor只和最匹配的一個object配對，例如 $\sum_{i}C_{ij} \in {0, 1}, \forall a_j \in A$ 。

假設有3個box $b_1, b_2, b_3$ ，有4個anchor $a_1, a_2, a_3, a_4$ ，兩兩配對的IoU值如下

	$a_1$	$a_2$	$a_3$	$a_4$
$b_1$	0.3	0.2	0.7	0.1
$b_2$	0.8	0.4	0.2	0.7
$b_3$	0.1	0.3	0.2	0.9

假設IoU閾值設置爲0.5，則可以去掉一些匹配選項

	$a_1$	$a_3$	$a_4$
$b_1$	0	0.7	0
$b_2$	0.8	0	0.7
$b_3$	0	0	0.9

對於 $a_4$ 那列，只選擇最匹配的一項，即 $b_3$

	$a_1$	$a_3$	$a_4$
$b_1$	0	1	0
$b_2$	1	0	0
$b_3$	0	0	1

所以 $\sum_{i}C_{ij} \in {0, 1}, \forall a_j \in A$ 。定義 $A_{+} \subseteq A$ 爲 $\{a_j | \sum_i C_{ij} = 1\}$ ， $A_{-} \subseteq A$ 爲 $\{a_j | \sum_i C_{ij} = 0\}$ 。看上例，則 $a_1, a_3, a_4 \in A_{+}$ ， $a_2 \in A_{-}$ 。

算法（一階算法）的損失函數是
$\mathcal{L}(\theta) = \sum_{a_j \in A_+} \sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{cls} + \beta \sum_{a_j \in A_+} \sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{loc} + \sum_{a_j \in A_-} \mathcal{L}(\theta)_j^{bg}$
上式3項分別表示objects的分類損失，objects的定位損失和背景類的分類損失。沒和object匹配的anchors都劃分爲背景類。 $\theta$ 表示模型參數。 $\mathcal{L}(\theta)_{ij}^{cls} = BCE(a_j^{cls}, b_i^{cls}, \theta)$ ， $\mathcal{L}(\theta)_{ij}^{loc} = SmoothL1(a_j^{loc}, b_i^{loc}, \theta)$ ， $\mathcal{L}(\theta)_{j}^{bg} = BCE(a_j^{cls}, \vec{0}, \theta)$ 。BCE表示二元交叉熵損失。 $\beta$ 表示正則化參數。

損失 $\mathcal{L}(\theta)$ 可以轉成最大似然問題
$\begin{aligned} \mathcal{P}(\theta) & = e^{-\mathcal{L}(\theta)} \\ & = e^{-\sum_{a_j \in A_+} \sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{cls} - \beta \sum_{a_j \in A_+} \sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{loc} - \sum_{a_j \in A_-} \mathcal{L}(\theta)_j^{bg} } \\ & = e^{-\sum_{a_j \in A_+} \sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{cls}} e^{- \beta \sum_{a_j \in A_+} \sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{loc} } e^{ -\sum_{a_j \in A_-} \mathcal{L}(\theta)_j^{bg} } \\ & = \prod_{a_j \in A_+} e^{-\sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{cls}} \prod_{a_j \in A_+} e^{-\sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{loc}} \prod_{a_j \in A_-} e^{- \mathcal{L}(\theta)_{j}^{bg}} \end{aligned}$
因爲 $C_{ij} \in \{0, 1\}$ ，而且 $A_{+} \subseteq A$ 爲 $\{a_j | \sum_i C_{ij} = 1\}$ ，所以可以把 $C_{ij}$ 移動e的外面
$\begin{aligned} \mathcal{P}(\theta) & = e^{-\mathcal{L}(\theta)} \\ & = \prod_{a_j \in A_+} e^{-\sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{cls}} \prod_{a_j \in A_+} e^{-\sum_{b_i \in B} C_{ij} \mathcal{L}(\theta)_{ij}^{loc}} \prod_{a_j \in A_-} e^{- \mathcal{L}(\theta)_{j}^{bg}} \\ & = \prod_{a_j \in A_+} (\sum_{b_i \in B} C_{ij} e^{-\mathcal{L}(\theta)_{ij}^{cls}} )\prod_{a_j \in A_+} (\sum_{b_i \in B} C_{ij} e^{-\mathcal{L}(\theta)_{ij}^{loc}}) \prod_{a_j \in A_-} e^{- \mathcal{L}(\theta)_{j}^{bg}} \end{aligned}$
考慮上例的 $a_1$
$\begin{aligned} & e^{-C_{01} \mathcal{L}(\theta)_{01}^{cls} -C_{11} \mathcal{L}(\theta)_{11}^{cls} -C_{21} \mathcal{L}(\theta)_{21}^{cls}} \\ &= e^{-0 \mathcal{L}(\theta)_{01}^{cls} -1 \mathcal{L}(\theta)_{11}^{cls} -0 \mathcal{L}(\theta)_{21}^{cls}} \\ &= e^{-\mathcal{L}(\theta)_{11}^{cls}} \\ &= 0 e^{-\mathcal{L}(\theta)_{01}^{cls}} + 1e^{-\mathcal{L}(\theta)_{11}^{cls}} + e^{-\mathcal{L}(\theta)_{21}^{cls}} \\ & = \sum_{b_i \in B} C_{i1} e^{-\mathcal{L}(\theta)_{i1}^{cls}} \end{aligned}$

令 $\mathcal{P}(\theta)_{ij}^{cls} = e^{-\mathcal{L}(\theta)_{ij}^{cls}}$ 爲類別的置信度。 $\mathcal{L}$ 公式中有log函數，與外面的e函數抵消後，就是類別的置信度（概率）。對定位損失套用相同的公式， $\mathcal{P}(\theta)_{ij}^{loc} = e^{-\mathcal{L}(\theta)_{ij}^{loc}}$ 表示爲定位的置信度。那麼
$\begin{aligned} \mathcal{P}(\theta) & = e^{-\mathcal{L}(\theta)} \\ & = \prod_{a_j \in A_+} (\sum_{b_i \in B} C_{ij} e^{-\mathcal{L}(\theta)_{ij}^{cls}} )\prod_{a_j \in A_+} (\sum_{b_i \in B} C_{ij} e^{-\mathcal{L}(\theta)_{ij}^{loc}}) \prod_{a_j \in A_-} e^{- \mathcal{L}(\theta)_{ij}^{bg}} \\ & = \prod_{a_j \in A_+} (\sum_{b_i \in B} C_{ij} \mathcal{P}(\theta)_{ij}^{cls}) \prod_{a_j \in A_+} (\sum_{b_i \in B} C_{ij} \mathcal{P}(\theta)_{ij}^{loc}) \prod_{a_j \in A_-} \mathcal{P}(\theta)_{j}^{bg} \end{aligned}$
上述公式就是極大似然估計的公式。目標檢測就轉成了極大似然估計問題。

考慮如何把通過nms得到的匹配矩陣 $C_{ij}$ 去掉，通過學習得到匹配矩陣 $C_{ij}$ 。

Detection Customized Likelihood

現在的目標是讓網絡自己學到最優的object-anchor匹配，保證算法具有高的召回率和精度，同時和預測後的NMS操作兼容。論文的做法首先是構造 a bag of candidate anchors for each object $b_i$ by selecting (n) top-ranked anchors $A_i \subset A$ in terms of their IoU with the object. 根據IoU爲每個object選擇top-n個anchor。接着，通過最大化自定義的似然來學習匹配的最佳的anchor。

爲了優化召回率，對於每個object $b_i \in B$ ，首先要保證至少有一個anchor $a_j \in A_i$ ，它的預測( $a_j^{cls}$ 和 $a_j^{loc}$ )接近gt。這個目標可以通過下面的公式來實現
$\mathcal{P}_{recall} = \prod_i \max_{a_j \in A_i} (\mathcal{P}(\theta)_{ij}^{cls} \mathcal{P}(\theta)_{ij}^{loc})$

爲了提升檢測的精度，檢測器需要把定位差的anchor分類成背景類。令 $P\{a_j \to b_i\}$ 表示anchor $a_j$ 正確預測object $b_i$ 的概率。anchor $a_j$ 最後匹配的是object $b_i, i = \arg \max P\{a_j \to b_i\}$ 。因此，anchor $a_j$ 有匹配object的概率是 $\max_i P\{a_j \to b_i\}$ , 而 $P\{a_j \in A_-\} = 1 - \max_i P\{a_j \to b_i\}$ 表示 $a_j$ 沒有與任何object匹配的概率。如果anchor $a_j$ 沒有與任何object匹配，那麼anchor $a_j$ 的類別是背景類，它的背景類的預測置信度要高。論文的公式是
$\mathcal{P}(\theta)_{precision} = \prod_j (1 - P\{a_j \in A_-\}(1-\mathcal{P}(\theta)_j^{bg}))$

我認爲是這樣的，匹配爲背景的anchor的預測類別不是背景類的概率要低。這個公式是爲了提高精度，精度的定義是precision = TP/(TP + FP)。這個公式的作用是通過降低FP來提高精度。FP是指沒有與任何一個gt box匹配，但是預測出其他類別（非背景類）的概率卻很高的anchor。只要預測出其他類別（非背景類）的概率變低（這個公式的目的），這些沒有和任何一個gt box匹配的anchor就不會是FP。

對於 $P\{a_j \to b_i\}$ ，它應該具備以下性質。（1） $P\{a_j \to b_i\}$ 對於 $a_j^{loc}$ 和 $b_i$ ，或者說 $IoU_{ij}^{loc}$ ，是一個單調遞增函數。（2）當 $IoU_{ij}^{loc}$ 小於閾值 $t$ ， $P\{a_j \to b_i\}$ 要接近0。（3）對於一個object $b_i$ ，存在並且只存在一個 $a_j$ 滿足 $P\{a_j \to b_i\} = 1$ 。滿足這些性質的公式是
$\text{Saturated linear}(x, t_1, t_2) = \begin{cases} 0, & x \le t_1 \\ \frac{x - t_1}{t_2 - t_1}, & t_1 \lt x \lt t_2 \\ 1, & x \ge t_2 \end{cases}$
如下圖所示

因此 $P\{a_j \to b_i\} = \text{Saturated linear}(x, t, \max_j (IoU_{ij}^{loc}))$ 。

最終，目標檢測的自定義似然定義爲
$\mathcal{P}(\theta) = \mathcal{P}(\theta)_{recall} \times \mathcal{P}(\theta)_{precision}$

Anchor Matching Mechanism

上述的自定義依然轉換成損失函數爲
$\begin{aligned} \mathcal{L} &= -\log \mathcal{P}(\theta) \\ &= -\sum_{i} \log (\max_{a_j \in A_i}(\mathcal{P}(\theta)_{ij}^{cls} \mathcal{P}(\theta)_{ij}^{loc})) - \sum_j \log (1 - P\{a_j \in A_-\}(1 - \mathcal{P}(\theta)_j^{bg})) \end{aligned}$
其中 $\max$ 函數用來爲每個object選擇最好的anchor。

在訓練早期，對於隨機初始化網絡參數，所有的anchors的置信度很小。高置信度的anchor對於檢測器的訓練來說不充足。論文提出了Mean-max函數
$\text{Mean-max}{X} = \frac{\sum_{x_j \in X} \frac{x_j}{1 - x_j}}{\sum_{x_j \in X} \frac{1}{1 - x_j}}$
用來選擇anchors。但訓練不充分時，Mean-max函數表現的像mean函數，意味着所有anchors都用來訓練。隨着訓練增加，一些anchors的置信度增加，Mean-max函數表現像max函數，如下圖

然後，損失函數變成
$\mathcal{L}'(\theta) = - w_1 \sum_{i} \log(\text{Mean-max}(X_i)) + w_2 \sum_j \text{FL\_}(P\{a_j \in A_\}(1 - \mathcal{P}(\theta)_j^{bg}))$
其中 $X_i = \{\mathcal{P}(\theta)_{ij}^{cls}\mathcal{P}(\theta)_{ij}^{loc} | a_j \in A_i\}$ 。繼承focal loss的參數 $\alpha$ 和 $\gamma$ ，設置 $w_1 = \frac{\alpha}{\lVert B \rVert}$ ， $w_2 = \frac{1 - \alpha}{n\lVert B \rVert}$ 。 $\text{FL\_}(p)=-p^{\gamma} \log (1-p)$ 。

Experiments

使用coco數據集，對於非中心化的類別，實驗效果有提升，如

與其他模型的比較：

《FreeAnchor: Learning to Match Anchors for Visual Object Detection》筆記

Introduction

Detector Training as Maximum Likelihood Estimation

Detection Customized Likelihood

Anchor Matching Mechanism

Experiments

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習06——小案例

評估統計算法在銀行僞造鈔票檢測中的價值

C# Xmlserializer 程序集內存泄露

Java ThreadPoolShutdown

5月21日相聚上海張江！與文心大模型一起共建大模型產業應用生態圈

《Relation Networks for Object Detection》筆記

《Deformable part-based fully convolutional network for object detection》筆記

《Semi-Supervised Pedestrian Instance Synthesis and Detection with Mutual Reinforcement》筆記

《Adapting Object Detectors via Selective Cross-Domain Alignment》筆記

《Focal loss for dense object detection》筆記

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結