A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function

(A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function) ¹

文章目錄

1.系統框圖

系統框圖如下，輸入雙通道信號分幀加窗、算個相干函數，就得到了濾波係數 $G$ ,看起來好簡單，那就看看它的詳細過程

A. Definition of Coherence Function

輸入帶噪信號定義爲：
$y_{i}(m)=x_{i}(m)+x_{i}(m),i=1,2 \tag{1}$

其中 $i$ 爲麥克風序號， $m$ 爲採樣點

$STFT$ 到時頻域：
$Y_{i}(\omega_{l},k)=X_{i}(\omega_{l},k)+N_{i}(\omega_{l},k),i=1,2 \tag{2}$
其中 $\omega_{l}$ 爲角頻率, $k$ 爲幀序號，下面的表示會省略掉 $l和k$ （for better clarity）

輸入信號 $y_{1},y_{2}$ 間的復相干函數定義爲：
$\Gamma _{y_{1}y_{2}}(\omega ,k)=\frac{\phi_{y_{1}y_{2}}}{\sqrt{\phi_{y_{1}y_{1}}\phi_{y_{2}y_{2}}}} \tag{3}$
其中 $\phi_{uu}$ 爲 $PSD$ （power spectral density， $\phi_{uv}$ 爲 $CSD$ （cross-power spectral density）
在遠場模型下，一個確定的方向聲源在兩個麥克風 $\theta$ 角方向入射，在兩個麥克風處接收到的信號的理想相干函數可以表示爲 ²
$\Gamma _{u_{1}u_{2}}(\omega)=e^{j\omega f_s(d/c)cos(\theta)} \tag{4}$
關於這個函數，咱們可以畫個圖驗證以下，信號從雙麥 $45^o$ 入射，估計出來的相干函數和理想曲線（上式）的實部和虛部對比作圖如下：

從圖中可以看到，式（4）與實際是相符的，但是要注意，這個是無混響的模型（order=0），混響越重這個曲線偏離越大

B. Proposed Method Based on Coherence Function

假設噪聲和信號不相關，則接收信號的CSD爲目標信號的CSD和噪聲信號CSD之和：
$\Gamma _{y_{1}y_{2}}=\Gamma _{x_{1}x_{2}}+\Gamma _{n_{1}n_{2}} \tag{5}$
兩邊同時除以 ${\sqrt{\phi_{y_{1}y_{1}}\phi_{y_{2}y_{2}}}}$ ，變成這樣：
$\Gamma _{y_{1}y_{2}}(\omega ,k)=\frac{\phi_{x_{1}x_{2}}}{\sqrt{\phi_{y_{1}y_{1}}\phi_{y_{2}y_{2}}}}+\frac{\phi_{n_{1}n_{2}}}{\sqrt{\phi_{y_{1}y_{1}}\phi_{y_{2}y_{2}}}} \tag{6}$
同樣假定接收信號的PSD爲目標信號的PSD和噪聲信號PSD之和，同時定義:
$SNR_i = \frac{\phi_{x_{i}x_{i}}}{\phi_{n_{i}n_{i}}} \tag{7}$
　　因爲麥克風間距比較小，可以大致認爲兩個麥克風處的 $SNR$ 基本相同，這樣一頓變變變，
接收信號的相干函數就變成了：
$\hat{\Gamma }_{y_{1}y_{2}}(\omega ,k)=\Gamma _{x_{1}x_{2}}\frac{\hat{SNR}}{1+\hat{SNR}}+\Gamma _{n_{1}n_{2}}\frac{1}{1+\hat{SNR}} \tag{8}$
　　看下這個公式，可以知道，當 $SNR高（\rightarrow +\infty）$ 時， $\hat{\Gamma }_{y_{1}y_{2}}(\omega ,k)$ 主要受目標信號的相干函數影響，當 $SNR低（\rightarrow 0）$ 時， $\hat{\Gamma }_{y_{1}y_{2}}(\omega ,k)$ 主要受噪聲信號的影響（其實這個結論不經過這一頓猛於虎的操作也能理解），
　　好了，接下來，就到最重要的一個分析公式了，將理想的相干函數（4）代入到（8）（歐拉公式替換）得到：
　　 $\hat{\Gamma }_{y_{1}y_{2}}(\omega)=[cos(\omega \tau)+jsin(\omega \tau)]\frac{\hat{SNR}}{1+\hat{SNR}}+[cos(\omega \tau cos\theta)+jsin(\omega \tau cos\theta)]\frac{1}{1+\hat{SNR}} \tag{9}$
　　其中 $\tau =f_s (d/c)$ ，後面會根據（9）式的特定設計想要的增益函數

下面來分析下噪聲在不同位置時的情況：

$\theta=90^o$
當干擾在雙麥正前方的時候，看式(4)，cos(90)=0，因此這個時候，噪聲產生的相干函數之爲1，爲實數，沒有虛部，看式(9)，可以知道，這個時候只有當語音存在的時候， $\hat{\Gamma }_{y_{1}y_{2}}(\omega)$ 纔有虛部（這個特點在這裏好像並沒有起到什麼作用？），因此，這種情況下，增益函數就應該抑制相干函數實部爲1的信號，如是，提出增益函數如下：
$G_1(\omega,k)=1-\begin{vmatrix} real(\hat{\Gamma }_{y_{1}y_{2}}(\omega,k)) \end{vmatrix} ^{P(\omega)}\tag{10}$
這個函數的曲線如下³:

　　從這個曲線圖來看，當輸入接近1的時候， $G_1$ 的值很小，起到抑制作用，同時，指數係數 $P$ 能夠控制衰減量的大小。

$90^o<\theta\leq 180^o$
　　上面 $\theta=90^o$ 咱們看的是相干函數的實部特點，這裏 $90^o<\theta<180^o$ 的時候，干擾信號的相干函數也是有虛部的，那就看下帶噪信號相干函數的虛部有什麼規律
　　由（9）可以直接寫出 $\hat{\Gamma }_{y_{1}y_{2}}(\omega)$ 的虛部表達式如下：
　　 $imag[\hat{\Gamma }_{y_{1}y_{2}}(\omega)]=sin(\omega \tau)\frac{\hat{SNR}}{1+\hat{SNR}}+sin(\omega \tau cos\theta)\frac{1}{1+\hat{SNR}} \tag{11}$
　　當 $\hat{SNR}高（\rightarrow +\infty）$ 時， $\hat{\Gamma }_{y_{1}y_{2}}(\omega ,k)$ 主要受目標信號的相干函數影響，當 $\hat{SNR}低（\rightarrow 0）$ 時， $\hat{\Gamma }_{y_{1}y_{2}}(\omega ,k)\approx sin(\omega \tau cos\theta)$ ，即主要受噪聲的影響，
　　根據前面假定的 $\omega<\pi,fs=16000$ ，麥間距爲20mm左右，因此 $\tau =fs*d/c也是小於1$ ，這樣的條件下 $sin(\omega \tau cos\theta)$ 恆小於0的，這就揭示了一個現象：當噪聲佔主要成分時，相干函數虛部爲0的概率就更大。
　　這裏也舉兩個極端的例子，
　　當 $\theta=180^o$ 時，若 $imag[\hat{\Gamma }_{y_{1}y_{2}}]<0$ ，根據式（11）可以得到 $\hat{SNR}<1（0dB）$ ，而
　　當 $\theta=90^o$ 時，這是上面已經討論過的一種情況，此時若還需要 $imag[\hat{\Gamma }_{y_{1}y_{2}}]<0$ ，則根據（11）式得到 $\hat{SNR}<0$ ，而由 $\hat{SNR}$ 的定義（7）可以知道 $\hat{SNR}$ 是恆爲正的，因此這也的確不屬於這裏討論的範圍。
　　綜上討論，當 $90^o<\theta\leq 180^o$ 時，增益函數設計爲：
　　 $G_2(\omega ,k) = \left\{\begin{matrix} \begin{aligned} &mu,imag[\hat{\Gamma }_{y_{1}y_{2}}(\omega)]<Q(\omega)\\ &1,otherwise \end{aligned} \end{matrix}\right.\tag{12}$

最終的增益函數：
　　討論了 $\theta=90^o$ 和 $90^o<\theta\leq 180^o$ 的兩種情況，最後得到的增益函數爲
　　 $G(\omega,k)=G_1(\omega,k)*G_2(\omega,k)$
當一個濾波器激活的時候另一個濾波器接近1，因此兩個濾波器相互並不影響
最後實現的時候還根據不同頻段的特點分頻段處理，詳細內容可以看看作者論文中的分析

看看處理前後的區別：

這篇論文中的方法輸出音頻幅度爲變小一些，但放大點可以看到對 $90^o$ 的干擾抑制還是很明顯的
References:

Yousefian, N., & Loizou, P. (2011). A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function. IEEE Transactions on Audio, Speech, and Language Processing. ↩︎
M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications. Berlin, Germany: Springer Verlag,2001（p.32） ↩︎
N. Yousefian, K. Kokkinakis, and P. C. Loizou, “A coherence-based algorithm for noise reduction in dual-microphone applications,” in Proc.Eur. Signal Process. Conf. (EUSIPCO’10), Alborg, Denmark, Aug.
2010, pp. 1904–1908 ↩︎

A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function

文章目錄

1.系統框圖

A. Definition of Coherence Function

B. Proposed Method Based on Coherence Function

杭州的 IT 崩盤了麼？

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

程序員常見的文本查看工具

ITSM落地經驗之建設藍圖規劃

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

噪聲估計之MCRA

維納濾波器

先驗信噪比估計

A Dual-Microphone Algorithm That Can Cope With Competing-Talkers Scenarios

A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結