手把手教你編寫SGM雙目立體匹配代碼（基於C++，Github同步更新）（三）（代價聚合）

上一篇博客中，我們介紹了初始代價計算的代碼，並做了實驗來驗證初始代價的計算結果，效果是顯而易見的糟糕，但是這個結果有它獨特的意義，它告訴大家只算初始代價值就想得到好的視差結果是想的太美，提醒人們聚合步驟對SGM匹配來說是多麼重要，讓我們再次近距離感受下它，併爲它鼓掌！

好了，閒話不多說，我們來揭開神祕的代價聚合步驟的面紗！

（注：代碼我會隨着博客的發佈實時更新到GitHub上，隨着專題的完成，Github代碼也將同步完成。倉庫地址：https://github.com/ethan-li-coding/SemiGlobalMatching.git，感興趣的話，點下star，有更新會實時通知到你的個人中心！)

視差主序

看過前篇的同學們肯定已經接觸了視差主序的概念，我再次重申一遍是因爲它很重要，關係到我如何存取代價數組的值。

我們談到主序，大家會想起二維數組中數據的存儲方式，如果是行主序，則數據優先在行內按順序緊密排列，即第0行第0列和第0行第1列是相鄰的元素；如果是列主序，則數據有限在列內按順序緊密排列，即第0行第0列和第1行第0列是相鄰的元素。主序類型，決定了通過行列號計算元素在數組中相對首地址的位置偏移量的方式，也決定了數組採用哪種遍歷順序會更高效（緩存原理）。

代價數組有三個維度：行、列、視差，視差主序的意思是同一個像素點各視差下的代價值緊密排列，也就是代價數組元素的排列順序爲：

(0,0)像素的所有視差對應的代價值；
(0,1)像素的所有視差對應的代價值；
…
…
(0,w-1)像素的所有視差對應的代價值；
(1,0)像素的所有視差對應的代價值；
(1,1)像素的所有視差對應的代價值；
…
…
第(h-1,w-1)個像素的所有視差對應的代價值；

這樣排列的好處是：單個像素的代價值都挨在一起，聚合時可以達到很高的存取效率。這對於大尺寸影像來說可帶來明顯的效率優勢，對於像CUDA這類存儲效率至關重要的平臺來說也有明顯優勢。

視差主序下，( i, j, d ) 位置的代價值由如下方式獲得（cost爲代價數組）：

cost[i * width * disp_range + j*disp_range + d]

介紹完主序方式，就可以開始攝入正餐了!

左右路徑聚合

其實所有路徑的聚合可以放到一個循環體中實現，但是爲了更清晰，我們把爲每一條路徑的聚合都單獨實現，最後把所有路徑的聚合值相加，就得到最終多路徑的聚合值。這樣還可以方便的選擇任意路徑的組合，來測試實驗效果。
我另外開闢了8個聚合代價數組

// ↘ ↓ ↙   5  3  7
// →    ←	 1    2
// ↗ ↑ ↖   8  4  6
/** \brief 聚合匹配代價-方向1	*/
uint8* cost_aggr_1_;
/** \brief 聚合匹配代價-方向2	*/
uint8* cost_aggr_2_;
/** \brief 聚合匹配代價-方向3	*/
uint8* cost_aggr_3_;
/** \brief 聚合匹配代價-方向4	*/
uint8* cost_aggr_4_;
/** \brief 聚合匹配代價-方向5	*/
uint8* cost_aggr_5_;
/** \brief 聚合匹配代價-方向6	*/
uint8* cost_aggr_6_;
/** \brief 聚合匹配代價-方向7	*/
uint8* cost_aggr_7_;
/** \brief 聚合匹配代價-方向8	*/
uint8* cost_aggr_8_;

而左右路徑，就是在同一行內從左到右執行聚合，上面的1-2方向，如圖：

我們再來看代價聚合公式：

像素p沿着某條路徑r的路徑代價計算公式

公式中 p 代表像素，r 代表路徑，左右路徑的情形下，p-r 就是 p 左側（從左到右聚合）或者右側（從右到左聚合）的相鄰像素，他們行號相等，列號相差1。L是聚合代價值，C是初始代價值。

我們分析下這個公式，一個像素的聚合代價值 L 等於它的初始代價值 C 加上四個運算值的最小值再減去另一個最小值（這個最小值也在前面四個運算值內，不用重複計算）。而這四個運算值的含義是什麼呢？直接告訴你們：

L(p - r,d)表示路徑內上一個像素視差爲d時的聚合代價值
L(p - r,d - 1)表示路徑內上一個像素視差爲d-1時的聚合代價值
L(p - r,d + 1)表示路徑內上一個像素視差爲d+1時的聚合代價值
min(L(p - r, i))表示路徑內上一個像素所有代價值的最小值
P1爲懲罰項P1，輸入參數
P2爲懲罰項P2，計算方式是P2_Init/(Ip - Ip-r)，I表示灰度，P2_Init爲輸入參數

這幾個值都是可以直接計算出來的，可以看到聚合時就是簡單的加減，基本沒什麼複雜的運算。

首先我們做一個初始化，就是讓第一個像素的聚合代價值等於它的初始代價值（沒辦法啊，它路徑上沒有上一個像素，犧牲自己成全大家吧），這樣我們從路徑上第二個像素開始聚合。

其次，我們開闢一個臨時的數組來存放路徑上上個像素的聚合代價值，這樣的好處是我們在聚合當前像素時，在局部小內存塊裏讀取上一個像素的代價值速度更快（全局的空間辣麼大，找位置都要逛半天當然慢了）。

最後，我們定義一個臨時的最小代價值，來記錄路徑上上一個像素的聚合代價值，因爲在計算當前像素的所有聚合代價時，就可以順便把最小值給算了，保存下來給下個像素用。

最後最後，我們遍歷路徑上所有像素，按照公式來加減完事。

最後最後最後，說明一下，我把左右路徑的兩個方向都放到一個函數裏，通過參數is_forward來控制是從左到右聚合還是從右到左聚合，因爲兩者移動方向是正好相對，起始位置是同一行的兩端，簡單來說就是1個是列號 + 1，一個是列號 - 1。

代碼如下：

void sgm_util::CostAggregateLeftRight(const uint8* img_data, const sint32& width, const sint32& height, const sint32& min_disparity, const sint32& max_disparity,
	const sint32& p1, const sint32& p2_init, const uint8* cost_init, uint8* cost_aggr, bool is_forward)
{
	assert(width > 0 && height > 0 && max_disparity > min_disparity);

	// 視差範圍
	const sint32 disp_range = max_disparity - min_disparity;

	// P1,P2
	const auto& P1 = p1;
	const auto& P2_Init = p2_init;

	// 正向(左->右) ：is_forward = true ; direction = 1
	// 反向(右->左) ：is_forward = false; direction = -1;
	const sint32 direction = is_forward ? 1 : -1;

	// 聚合
	for (sint32 i = 0u; i < height; i++) {
		auto cost_init_row = (is_forward) ? (cost_init + i * width * disp_range) : (cost_init + i * width * disp_range + (width - 1) * disp_range);
		auto cost_aggr_row = (is_forward) ? (cost_aggr + i * width * disp_range) : (cost_aggr + i * width * disp_range + (width - 1) * disp_range);
		auto img_row = (is_forward) ? (img_data + i * width) : (img_data + i * width + width - 1);

		// 路徑上上個像素的代價數組，多兩個元素是爲了避免邊界溢出
		std::vector<uint8> cost_last_path(disp_range + 2, UINT8_MAX);

		// 初始化：第一個像素的聚合代價值等於初始代價值
		memcpy(cost_aggr_row, cost_init_row, disp_range * sizeof(uint8));
		memcpy(&cost_last_path[1], cost_aggr_row, disp_range * sizeof(uint8));
		cost_init_row += direction * disp_range;
		cost_aggr_row += direction * disp_range;
		img_row += direction;

		// 路徑上上個像素的最小代價值
		uint8 mincost_last_path = UINT8_MAX;
		for (auto cost : cost_last_path) {
			mincost_last_path = std::min(mincost_last_path, cost);
		}

		// 自方向上第2個像素開始按順序聚合
		const sint32 start = is_forward ? 1u : width - 2;
		const sint32 end = is_forward ? width : -1;
		for (sint32 j = start; j != end; j += direction) {
			const uint8 gray = *img_row;
			const uint8 gray_last = *(img_row - direction);
			uint8 min_cost = UINT8_MAX;
			for (sint32 d = 0; d < disp_range; d++){
				// Lr(p,d) = C(p,d) + min( Lr(p-r,d), Lr(p-r,d-1) + P1, Lr(p-r,d+1) + P1, min(Lr(p-r))+P2 ) - min(Lr(p-r))
				const uint8  cost = cost_init_row[d];
				const uint16 l1 = cost_last_path[d + 1];
				const uint16 l2 = cost_last_path[d] + P1;
				const uint16 l3 = cost_last_path[d + 2] + P1;
				const uint16 l4 = mincost_last_path + P2_Init / (abs(gray - gray_last) + 1);
				
				const uint8 cost_s = cost + static_cast<uint8>(std::min(std::min(l1, l2), std::min(l3, l4)) - mincost_last_path);
				
				cost_aggr_row[d] = cost_s;
				min_cost = std::min(min_cost, cost_s);
			}

			// 重置上個像素的最小代價值和代價數組
			mincost_last_path = min_cost;
			memcpy(&cost_last_path[1], cost_aggr_row, disp_range * sizeof(uint8));

			// 下一個像素
			cost_init_row += direction * disp_range;
			cost_aggr_row += direction * disp_range;
		}
	}
}

上下路徑聚合

左右路徑描述的太累，上下可以偷個懶，左右是在同一行，上下就是同一列，左右是列號+/-1，上下就是行號+/-1。

代碼如下：

void sgm_util::CostAggregateUpDown(const uint8* img_data, const sint32& width, const sint32& height,
	const sint32& min_disparity, const sint32& max_disparity, const sint32& p1, const sint32& p2_init,
	const uint8* cost_init, uint8* cost_aggr, bool is_forward)
{
	assert(width > 0 && height > 0 && max_disparity > min_disparity);

	// 視差範圍
	const sint32 disp_range = max_disparity - min_disparity;

	// P1,P2
	const auto& P1 = p1;
	const auto& P2_Init = p2_init;

	// 正向(上->下) ：is_forward = true ; direction = 1
	// 反向(下->上) ：is_forward = false; direction = -1;
	const sint32 direction = is_forward ? 1 : -1;

	// 聚合
	for (sint32 j = 0; j < width; j++) {
		auto cost_init_col = (is_forward) ? (cost_init + j * disp_range) : (cost_init + (height - 1) * width * disp_range + j * disp_range);
		auto cost_aggr_col = (is_forward) ? (cost_aggr + j * disp_range) : (cost_aggr + (height - 1) * width * disp_range + j * disp_range);
		auto img_col = (is_forward) ? (img_data + j) : (img_data + (height - 1) * width + j);

		// 路徑上上個像素的代價數組，多兩個元素是爲了避免邊界溢出
		std::vector<uint8> cost_last_path(disp_range + 2, UINT8_MAX);

		// 初始化：第一個像素的聚合代價值等於初始代價值
		memcpy(cost_aggr_col, cost_init_col, disp_range * sizeof(uint8));
		memcpy(&cost_last_path[1], cost_aggr_col, disp_range * sizeof(uint8));
		cost_init_col += direction * width * disp_range;
		cost_aggr_col += direction * width * disp_range;
		img_col += direction * width;

		// 路徑上上個像素的最小代價值
		uint8 mincost_last_path = UINT8_MAX;
		for (auto cost : cost_last_path) {
			mincost_last_path = std::min(mincost_last_path, cost);
		}

		// 自方向上第2個像素開始按順序聚合
		const sint32 start = is_forward ? 1u : height - 2;
		const sint32 end = is_forward ? height : -1;
		for (sint32 i = start; i != end; i += direction) {
			const uint8 gray = *img_col;
			const uint8 gray_last = *(img_col - direction * width);
			uint8 min_cost = UINT8_MAX;
			for (sint32 d = 0; d < disp_range; d++) {
				// Lr(p,d) = C(p,d) + min( Lr(p-r,d), Lr(p-r,d-1) + P1, Lr(p-r,d+1) + P1, min(Lr(p-r))+P2 ) - min(Lr(p-r))
				const uint8  cost = cost_init_col[d];
				const uint16 l1 = cost_last_path[d + 1];
				const uint16 l2 = cost_last_path[d] + P1;
				const uint16 l3 = cost_last_path[d + 2] + P1;
				const uint16 l4 = mincost_last_path + P2_Init / (abs(gray - gray_last) + 1);

				const uint8 cost_s = cost + static_cast<uint8>(std::min(std::min(l1, l2), std::min(l3, l4)) - mincost_last_path);

				cost_aggr_col[d] = cost_s;
				min_cost = std::min(min_cost, cost_s);
			}

			// 重置上個像素的最小代價值和代價數組
			mincost_last_path = min_cost;
			memcpy(&cost_last_path[1], cost_aggr_col, disp_range * sizeof(uint8));

			// 下一個像素
			cost_init_col += direction * width * disp_range;
			cost_aggr_col += direction * width * disp_range;
		}
	}
}

總路徑聚合

秉着篇幅太長不好的原則，爲看客着想，嘿嘿，本篇就只先介紹左右路徑和上下路徑，也就是4-路徑聚合。其實這也是一種常用的方式，因爲有的實際應用非常看重效率，而效果則可以有所犧牲，4-路徑往往是不錯的選擇。

在計算完左右路徑和上下路徑的聚合代價值後，把這四個加起來，就得到最終的多路徑聚合代價值。所以我們的代價聚合方法實現如下：

void SemiGlobalMatching::CostAggregation() const
{
    // 路徑聚合
    // 1、左->右/右->左
    // 2、上->下/下->上
    // 3、左上->右下/右下->左上
    // 4、右上->左上/左下->右上
    //
    // ↘ ↓ ↙   5  3  7
    // →    ←	 1    2
    // ↗ ↑ ↖   8  4  6
    //
    const auto& min_disparity = option_.min_disparity;
    const auto& max_disparity = option_.max_disparity;
    assert(max_disparity > min_disparity);

    const sint32 size = width_ * height_ * (max_disparity - min_disparity);
    if(size <= 0) {
        return;
    }

    const auto& P1 = option_.p1;
    const auto& P2_Int = option_.p2_init;

    // 左右聚合
    sgm_util::CostAggregateLeftRight(img_left_, width_, height_, min_disparity, max_disparity, P1, P2_Int, cost_init_, cost_aggr_1_, true);
    sgm_util::CostAggregateLeftRight(img_left_, width_, height_, min_disparity, max_disparity, P1, P2_Int, cost_init_, cost_aggr_2_, false);
    // 上下聚合
    sgm_util::CostAggregateUpDown(img_left_, width_, height_, min_disparity, max_disparity, P1, P2_Int, cost_init_, cost_aggr_3_, true);
    sgm_util::CostAggregateUpDown(img_left_, width_, height_, min_disparity, max_disparity, P1, P2_Int, cost_init_, cost_aggr_4_, false);


    // 把4/8個方向加起來
    for(sint32 i =0;i<size;i++) {
    	cost_aggr_[i] = cost_aggr_1_[i] + cost_aggr_2_[i] + cost_aggr_3_[i] + cost_aggr_4_[i];
    	if (option_.num_paths == 8) {
            cost_aggr_[i] += cost_aggr_5_[i] + cost_aggr_6_[i] + cost_aggr_7_[i] + cost_aggr_8_[i];
        }
    }
}