MyDLNote-Enhancement:[2020 CVPR] Domain Adaptation for Image Dehazing

2020 CVPR : Domain Adaptation for Image Dehazing

[paper] : http://export.arxiv.org/pdf/2005.04668

 

這篇文章之所以能夠在 CVPR 發表,真的是因爲該文章確實切中了目前基於深度學習去霧算法的要害,即依據大氣光物理模型生成的人工合成霧圖像與真實拍到的霧圖像是不一樣的。也就是說,大家一直廣泛應用的大氣光物理模型只是霧圖像產生的一個近似圖像,而非真實圖像。在人工合成數據集上訓練的去霧模型,自然是不能適應於真實霧圖像的高質量去霧。

本人是第一次接觸 Domain Adaptation 相關的研究內容,DA 其實是解決上述問題的一個非常合適的方法。

種豆南山下 的知乎上對 DA 做了比較全面的介紹,牆裂推薦學習,相關連接:https://zhuanlan.zhihu.com/p/53359505

 

Abstract

Image dehazing using learning-based methods has achieved state-of-the-art performance in recent years. However, most existing methods train a dehazing model on synthetic hazy images, which are less able to generalize well to real hazy images due to domain shift.

對現有深度學習算法的肯定和否定:域遷移問題。

總:To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules.

分:Specifically, we first apply a bidirectional translation network to bridge the gap between the synthetic and real domains by translating images from one domain to another. And then, we use images before and after translation to train the proposed two image dehazing networks with a consistency constraint (後面說的 consistency loss). In this phase, we incorporate the real hazy image into the dehazing training via exploiting the properties of the clear image (e.g., dark channel prior and image gradient smoothing) to further improve the domain adaptivity. By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing.

介紹本文的方法,總-分的描寫形式。

整個結構有兩個部分:一個雙向的遷移網絡;兩個去霧網絡(分別在人工合成霧圖像和遷移到的真實霧圖像兩個域中的去霧)。

Experimental results on both synthetic and real-world images demonstrate that our model performs favorably against the state-of-the-art dehazing algorithms.

實驗結果。

 

Introduction

Single image dehazing aims to recover the clean image from a hazy input, which is essential for subsequent high-level tasks, such as object recognition and scene understanding. Thus, it has received significant attention in the vision community over the past few years. According to the physical scattering models [21, 23, 18], the hazing process is usually formulated as

where I(x) and J(x) denote the hazy image and the clean image, A is the global atmospheric light, and t(x) is the transmission map. The transmission map can be represented as t(x) = e^{-\betad(x)} , where d(x) and \beta denote the scene depth and the atmosphere scattering parameter, respectively. Given a hazy image I(x), most dehazing algorithms try to estimate t(x) and A.

背景:這段介紹去霧是幹嘛的。公式(1)就是傳統去霧算法(不論是基於先驗的還是基於深度學習的)的問題根源。

However, estimating the transmission map from a hazy image is an ill-posed problem generally. Early prior-based methods try to estimate the transmission map by exploiting the statistical properties of clear images, such as dark channel prior [9] and color-line prior [8]. Unfortunately, these image priors are easily inconsistent with the practice, which may lead to inaccurate transmission approximations. Thus, the quality of the restored image is undesirable.

問題:傳統基於先驗的去霧算法存在的問題:這些圖像先驗很容易與實踐不一致,這可能導致不準確的透射率圖估計。因此,恢復圖像的質量是不理想的。

To deal with this problem, convolutional neural networks (CNNs) have been employed to estimate transmissions [4, 26, 35] or predict clear images directly [12, 27, 16, 25]. These methods are valid and superior to the prior-based algorithms with significant performance improvements. However, deep learning-based approaches need to rely on a large amount of real hazy images and their hazefree counterparts for training. In general, it is impractical to acquire large quantities of ground-truth images in the real world. Therefore, most dehazing models resort to training on synthetic hazy dataset. However, due to the domain shift problem, the models learned from synthetic data often fail to generalize well to real data.

問題:傳統基於深度學習的去霧算法存在的問題:然而,基於深度學習的方法需要依賴大量真實的模糊圖像和對應的模糊圖像進行訓練。一般來說,在現實世界中獲取大量的真實霧圖像是不現實的。因此,大多數去霧模型都是在合成模糊數據集上進行訓練。然而,由於 domain shift 問題(人工合成霧圖像域和真實霧圖像域之間的遷移),從綜合數據中學習的模型往往不能很好地推廣到實際數據。

To address this issue, we propose a domain adaptation framework for single image dehazing. The proposed framework includes two parts, namely an image translation module and two domain-related dehazing modules (one for synthetic domain and another for real domain).

To reduce the discrepancy between domains, our method first employs the bidirectional image translation network to translate images from one domain to another. Since image haze is a kind of noise and nonuniform highly depending on the scene depth, we incorporate the depth information into the translation network to guide the translation of synthetic to real hazy images.

Then, the domain-related dehazing network takes images of this domain, including the original and translated images, as inputs to perform image dehazing. Moreover, we use a consistency loss to ensure that the two dehazing networks generate consistent results. In this training phase, to further improve the generalization of the network in the real domain, we incorporate the real hazy images into the training. We hope that the dehazing results of the real hazy image can have some properties of the clear images, such as dark channel prior and image gradient smoothing. We train the image translation network and dehazing networks in an end-to-end manner so that they can improve each other. As shown in Figure 1, our model produces a cleaner image when compared with recent dehazing work of EPDN [25].

方法: 主要解決 domain shift 問題。該框架包括圖像轉換模塊和域相關去霧模塊兩部分。

1. bidirectional image translation network:減少域之間的差異;

2. to perform image dehazing: 領域相關去霧網絡獲取該領域的圖像,包括原始圖像(人工合成域)和遷移圖像(遷移到真實域);還包括了幾個小技術:1)使用一致性損失(consistency loss)來確保兩個去霧網絡產生一致的結果;2)將真實的霧圖像納入到訓練中,爲了進一步提高網絡在真實領域的泛化。

 Figure 1

We summarize the contributions of our work as follows:

• We propose an end-to-end domain adaptation framework for image dehazing, which effectively bridges the gap between the synthetic and real-world hazy images.

• We show that incorporating real hazy images into the training process can improve the dehazing performance.

• We conduct extensive experiments on both synthetic datasets and real-world hazy images, which demonstrate that the proposed method performs favorably against the state-of-the-art dehazing approaches.

貢獻:

提出了一種用於圖像去霧的端到端域適應框架,有效地彌合了合成圖像和真實圖像之間的差距。

實驗結果表明,在訓練過程中加入真實的霧圖像可以提高去霧性能。

在合成數據集和真實的霧圖像上進行了廣泛的實驗,實驗結果表明,該方法優於目前最先進的去霧方法。

 

Related Work

Domain Adaptation

Domain adaptation aims to reduce the discrepancy between different domains [1, 6, 20]. Existing work either to perform feature-level or pixel-level adaptation. Feature-level adaptation methods aim at aligning the feature distributions between the source and target domains through minimizing the maximum mean discrepancy [19], or applying adversarial learning strategies [32, 31] on the feature space. Another line of research focuses on pixel-level adaptation [3, 28, 7]. These approaches deal with the domain shift problem by applying image-to-image translation [3, 28] learning, or style transfer [7] methods to increase the data in the target domain.

介紹了 DA。

DA:旨在減少不同域之間的差異。有兩種,特徵級或像素級調整。

特徵級:目的在於對齊源域和目標域之間的特性分佈,通過最小化最大平均差異,或在特徵空間上採用對抗性學習策略。

像素級:目的在於增加目標域中的數據,通過圖像到圖像的轉換學習,或風格遷移方法。

Most recently, many methods perform feature-level and pixel-level adaptation jointly in many visual tasks, e.g., image classification [10], semantic segmentation [5], and depth prediction [37]. These methods [5, 37] translate images from one domain to another with pixel-level adaptation via image-to-image translation networks, e.g., the CycleGAN [38]. The translated images are then inputted to the task network with feature-level alignment. In this work, we take advantage of CycleGAN to adapt the real hazy images to our dehazing model trained on synthetic data. Moreover, since the depth information is closely related to the formulation of image haze, we incorporate the depth information into the translating network to better guide the real hazy image translation.

特徵級-像素級聯合:第一步,目的在於將圖像從一個域轉換到另一個域,通過像素級自適應的圖像到圖像的轉換網絡 (如CycleGAN) 。遷移後的圖像通過特徵級對齊方式輸入到任務網絡中。

本文利用 CycleGAN 將真實的霧圖像與人工合成數據上訓練的去霧模型相適應。

另外,由於深度信息與圖像霧霾形成密切相關,將深度信息融入到遷移網絡中,更好的指導真實的霧圖像的遷移。

 

Proposed Method

Method Overview

Given a synthetic dataset X_S = \{x_s, y_s\}^{N_l }_{s=1} and a real hazy image set X_R = \{x_r\}^{N_r}_{r=1}, where N_land N_u denote the number of the synthetic and real hazy images, respectively. We aim to learn a single image dehazing model which can accurately predict the clear image from real hazy image. Due to the domain shift, the dehazing model trained only on the synthetic data can not generalize well to the real hazy image.

數據集定義、目標設定和問題描述。

目標:學習一種能夠從真實的霧圖像中準確地預測出清晰圖像的單一圖像去霧模型。

問題:由於 domain shift,單純在合成數據上訓練的去霧模型不能很好地應用於真實的霧圖像。

To deal with this problem, we present a domain adaptation framework, which consists of two main parts: the image translation network G_{S\rightarrow R} and G_{R\rightarrow S}, and two dehazing networks G_S and G_R. The image translation network translates images from one domain to another to bridge the gap between them. Then the dehazing networks perform image dehazing using both translated images and source images (e.g., synthetic or real).

整體思路:遷移網絡 G_{S\rightarrow R} 和 G_{S\rightarrow S},用於人工合成圖像與真實自然圖像之間的轉換;

G_S 和 G_R 分別用於對人工合成圖像去霧和真實圖像去霧。

 

As illustrated in Figure 2, the proposed model takes a real hazy image x_r and a synthetic image x_s along with its corresponding depth images ds as input. We first obtain the corresponding translated images x_{s\rightarrow r}=G_{S\rightarrow R}(x_s, d_s) and x_{s\rightarrow s}=G_{R\rightarrow S}(x_r) using two image translators. And then, we pass x_s and x_{r\rightarrow s} to G_S, x_r and x_{s\rightarrow r} to G_R to perform image dehazing.

Figure 2. Architecture of the proposed domain adaptation framework for image dehazing. The framework consists of two parts, an image translation module and two image dehazing modules. The image translation module translates images from one domain to another to reduce the domain discrepancy. The image dehazing modules perform image dehazing on both synthetic and real domain.

整個網絡結構如圖所示。

 

Image Translation Module

The image translation module includes two translators: synthetic to real network G_{S\rightarrow R} and real to synthetic network G_{R\rightarrow S}. The G_{S\rightarrow R} network takes (X_s, D_s) as inputs, and generates translated images G_{S\rightarrow R}(X_s, D_s) with similar style to the real hazy images. Another translator G_{R\rightarrow S} performs image translation inversely. Since the depth information is highly correlated to the hazing formulation, we incorporate it into the generator G_{S\rightarrow R} to produce images with similar haze distribution in real cases.

two translators:  G_{S\rightarrow R} 人工合成霧圖 to 真實霧圖;G_{R\rightarrow S} 真實霧圖 to 人工合成霧圖。

G_{S\rightarrow R} :輸入人工合成霧圖和深度圖;輸出遷移的真實霧圖和遷移的深度圖。

G_{R\rightarrow S}:輸入真實霧圖;輸出遷移的人工合成霧圖。

 

We adopt the spatial feature transform (SFT) layer [33, 15] to incorporate the depth information into the translation network, which can fuse features from depth map and synthetic image effectively. As shown in Fig. 3, the SFT layer first applies three convolution layers to extract conditional maps φ from the depth map. The conditional maps are then fed to the other two convolution layers to predict the modulation parameters, γ and β, respectively. Finally, we can obtain the output shifted features by:

where \dottimes is the element-wise multiplication. In the translator G_{S\rightarrow R}, we treat the depth map as the guidance and use the SFT layer to transform the features of the penultimate convolution layer. As shown in Fig. 4, the synthetic images are relative closer to the real-world hazy image after the translation.

We show the detailed configurations of the translator G_{S\rightarrow R} in Table 1. We also employ the architectures, provided by CycleGAN [38], for the generator G_{R\rightarrow S} and discriminators (D^{img}_R and D^{img}_S ).

本段介紹瞭如何將深度信息遷移在生成的真實圖像中。

SFT 空間特徵變換模塊,結構不多說了,如圖 3 所示。

關於 SFT 的論文:

[2018 CVPR] Recovering realistic texture in image super-resolution by deep spatial feature transform

[2020 TIP] Dynamic scene deblurring by depth guided model

詳細的 G_{S\rightarrow R} 結構如表 1 所示。

Figure 3. Structure of the SFT layer. In the translator G_{S\rightarrow R}, we consider the depth map as the guidance to assist the image translation.

(a) Synthetic hazy image                         (b) Translated image                             (c) Real hazy image

Figure 4. Translated results on two synthetic hazy images.

 

Table 1. Configurations of image translation module. “Conv” denotes the convolution layer, “Res” denotes the residual block, “Upconv” denotes the up-sample layer by transposed convolution operator and “Tanh” denotes the non-linear Tanh layer

 

Dehazing Module

Our method includes two dehazing modules G_S and G_R, which perform image dehazing on synthetic and real domains, respectively. G_S takes the synthetic image x_s and the translated image x_{r\rightarrow s} as inputs to perform image dehazing. And G_R is trained on x_r and x_{s\rightarrow r}. For these two image dehazing networks, we both utilize a standard encoder-decoder architecture with skip connections and side outputs as [37]. The dehazing network in each domain shares the same network architecture but with different learned parameters.

去霧網絡,就是簡單的 U-Net。

說實話,U-Net 真好用。

 

Training Losses

In the domain adaptation framework, we adopt the following losses to train the network.

遷移網絡和去霧網絡採用不同的 Loss 函數。

本文的 Loss 函數,有點多。

  • Image translation Losses.

The aim of our translate module is to learn the translators G_{S\rightarrow R} and G_{R\rightarrow S} to reduce the discrepancy between the synthetic domain X_S and the real domain X_R. For translators G_{S\rightarrow R}, we expect the G_{S\rightarrow R}(x_s, d_s) to be indistinguishable from the real hazy image x_r. Thus, we employ an image-level discriminators D^{img }_R and a feature-level discriminators D^{feat}_R , to perform a minmax game via an adversarial learning manner. The D^{img }_R aims at aligning the distributions between the real image x_r and the translated image G_{S\rightarrow R}(x_s, d_s). The discriminator D^{feat}_R helps align the distributions between the feature map of x_r and G_{S\rightarrow R}(x_s, d_s).

The adversarial losses are defined as:

Similar to GS→R, the translator GR→S has another image-level adversarial loss and feature-level adversarial loss, which are denoted as , , respectively

 

In addition, we utilize the cycle-consistency loss [38] to regularize the training of translation network. Specifically, when passing an image xs to GS→R and GR→S sequentially, we expect the output should be the same image, and vice versa for xr. Namely, and .

The cycle consistency loss can be expressed as:

 

Finally, to encourage the generators to preserve content information between the input and output, we also utilize an identity mapping loss [38], which is denoted as:

 

The full loss function for the the translating module is as follow:

  • Image translation Losses:

圖像級對抗損失函數(image-level adversarial loss)和特徵級對抗損失函數(feature-level adversarial loss):對於從人工合成霧圖遷移到真實霧圖的網絡,adversarial loss 是指遷移的真實圖像與真實圖像之間的對比(圖像級和特徵級,兩個);同理,對於從真實霧圖遷移到人工合成霧圖的網絡,也是兩個對抗損失函數;這樣,一共有 4 個損失函數;

一致性損失函數(cycle-consistency loss):是指,從原始人工合成霧圖遷移到真實的霧圖,再經過真實霧圖遷移到人工合成霧圖網絡,此時的圖像與原始人工合成霧圖應該比較近似,用 L1 損失函數刻畫差別;同理,從原始真實霧圖遷移到人工合成的霧圖,再經過人工合成霧圖遷移到真實霧圖網絡,得到的圖像應該和原始真實霧圖相似(說的有些繞,希望大家能夠理解);

身份匹配損失函數(identity mapping loss):是指,將人工合成霧圖輸入到從真實霧圖遷移到人工合成霧圖網絡中,生成的圖應和原始人工合成圖像相似,用 L1 損失函數刻畫差別;相反,將真實霧圖輸入到從人工合成霧圖遷移到真實霧圖網絡中,生成的圖應和原始真實霧圖相似。

 

  • Image dehazing Losses.

We can now transfer the synthetic images XS and the corresponding depth images DS to the generator GS→R, and obtain a new dataset XS→R = GS→R(XS, DS), which has a similar style with real hazy images. And then, we train a image dehazing network GR on XS→R and XR in a semi-supervised manner. For supervised branch, we apply the mean squared loss to ensure the predicted images JS→R is close to clean images YS, which can be defined as:

In the unsupervised branch, we introduce the total variation and dark channel losses, which regularize the dehazing network to produce images with similar statistical characteristics of the clear images. The total variation loss is an L1-regularization gradient prior on the predicted images JR:

where ∂h denotes the horizontal gradient operators, and ∂v represents the vertical gradient operators.

Furthermore, the recent work [9] has proposed the concept of the dark channel, which can be expressed as:

where x and y are pixel coordinates of image I, I c represents c-th color channel of I, and N(x) denotes the local neighborhood centered at x. He et al. [9] have also shown that most intensity of the dark channel image are zero or close to zero . Therefore, we apply the following dark channel (DC) loss to ensure that the dark channel of the predicted images are in consistence with that of clean image:

In addition, we also train a complementary image dehazing network GS on XS and XR→S. Similarly, we apply the same supervised loss and unsupervised loss to train the dehazing network GS, which are as follows:

Finally, considering that the outputs of the two dehazing networks should be consistency for real hazy images, i.e., GR(XR) ≈ GS(GR→S(XR)), we introduce following consistency loss:

 

  • Overall Loss Function.

The overall loss function are de- fined as follow:

where λm, λd, λt and λc are trade-off weights.

  • Image dehazing Losses.

L2 損失函數(mean squared loss):先將人工合成霧圖遷移到真實霧圖,然後去霧,得到的圖與 GT 的 L2 損失函數;

全變差損失函數和暗通道損失函數(total variation and dark channel losses);

一致性損失(consistency loss):真實霧圖經過真實霧圖去霧網絡,與真實霧圖遷移到人工合成的霧圖經過真實霧圖去霧網絡得到的結果應該相似。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章