期望極大(EM)算法

EM算法

概率模型有時既含有觀測變量(observable vriable),又含有隱變量或潛在變量(latent variable)。如果概率模型的變量都是觀測變量,那麼給定數據,可以直接用極大似然估計法,或貝葉斯估計法估計模型參數。

EM算法[Dempster, 1977]是一種迭代算法,用於含有隱變量(hidden variable)的概率模型參數的極大似然估計,或極大後驗概率估計。EM算法的每次迭代由兩步組成:E步,求期望(expectation);M步,求極大(maximization)。所以這一算法稱爲期望極大算法(expectation maximization algorithm,EM)

1 EM 算法

在這裏插入圖片描述

import numpy as np

def e_step(y, pi, p, q):
    
    mu_1 = pi * p ** y * (1 - p) ** (1 - y)
    mu_2 = (1 - pi) * q ** y * (1 - q) ** (1 - y)
    
    mu = mu_1 / (mu_1 + mu_2)
    
    return mu

def m_step(y, mu):
    
    n = len(y)
    pi = np.sum(mu) / n
    p = sum(y * mu) / sum(mu)
    q = sum(y * (1 - mu)) / sum(1 - mu)
    
    return pi, p, q

def diff(pi, p, q, pi_, p_, q_):
    
    return np.sum(np.abs([pi - pi_, p - p_, q - q_]))

def em(y, pi, p, q):
    cnt = 1
    while True:

        print("-" * 10)
        print("iter %d:" % cnt)
        pi_ = pi
        p_ = p
        q_ = q

        mu = e_step(y, pi, p, q)
        print(mu)
        pi, p, q = m_step(y, mu)
        print(pi, p, q)

        if diff(pi, p, q, pi_, p_, q_) < 0.001:
            break

        cnt += 1
        
    return pi, p, q

y = np.array([1, 1, 0, 1, 0, 0, 1, 0, 1, 1])

print("*" * 10)
pi = 0.5
p = 0.5
q = 0.5

pi, p, q = em(y, pi, p, q)

print("*" * 10)
pi = 0.4
p = 0.6
q = 0.7

pi, p, q = em(y, pi, p, q)

print("*" * 10)
pi = 0.46
p = 0.55
q = 0.67

pi, p, q = em(y, pi, p, q)
**********
----------
iter 1:
[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
0.5 0.6 0.6
----------
iter 2:
[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
0.5 0.6 0.6
**********
----------
iter 1:
[0.36363636 0.36363636 0.47058824 0.36363636 0.47058824 0.47058824
 0.36363636 0.47058824 0.36363636 0.36363636]
0.40641711229946526 0.5368421052631579 0.6432432432432431
----------
iter 2:
[0.36363636 0.36363636 0.47058824 0.36363636 0.47058824 0.47058824
 0.36363636 0.47058824 0.36363636 0.36363636]
0.40641711229946526 0.5368421052631579 0.6432432432432431
**********
----------
iter 1:
[0.41151594 0.41151594 0.53738318 0.41151594 0.53738318 0.53738318
 0.41151594 0.53738318 0.41151594 0.41151594]
0.461862835113919 0.5345950037850112 0.6561346417857326
----------
iter 2:
[0.41151594 0.41151594 0.53738318 0.41151594 0.53738318 0.53738318
 0.41151594 0.53738318 0.41151594 0.41151594]
0.46186283511391907 0.5345950037850112 0.6561346417857326

通常情況,YY表示觀測隨機變量的數據,ZZ表示隱隨機變量的數據。YYZZ均已知稱爲完全數據(complete-data),僅有觀測數據YY稱爲不完全數據(incomplete-data)。假設給定觀測數據YY,其概率分佈是P(Y;θ)P(Y; \theta),其中θ\theta是需要估計的模型參數,那麼不完全數據YY的似然函數是P(Y;θ)P(Y; \theta),對數似然函數L(θ)=logP(Y;θ)L(\theta) = \log P(Y; \theta);假設YYZZ的聯合概率分佈是P(Y,Z;θ)P(Y, Z; \theta),那麼完全數據的對數似然函數是L(θ)=logP(Y,Z;θ)L(\theta) = \log P(Y, Z; \theta)

EM算法通過迭代求解L(θ)=logP(Y,Z;θ)L(\theta) = \log P(Y, Z ; \theta)的極大似然估計。每次迭代包含兩步:E步,求期望;M步,求極大化。

算法9.1(EM算法)

輸入:觀測變量數據YY,隱變量數據ZZ,聯合分佈P(Y,Z;θ)P(Y, Z; \theta),條件分佈P(ZY;θ)P(Z | Y; \theta)

輸出:模型參數θ\theta

  1. 選擇參數初值θ(0)\theta^{(0)},開始迭代;

  2. E步:記θ(i)\theta^{(i)}爲第ii次迭代參數θ\theta的估計值,在第i+1i + 1次迭代的E步,計算

Q(θ,θ(i))=EZ[logP(Y,Z;θ)Y;θ(i)]=ZP(ZY;θ(i))logP(Y,Z;θ)(9)\begin{aligned} Q(\theta, \theta^{(i)}) & = \text{E}_{Z} \left[ \log P(Y, Z; \theta) | Y; \theta^{(i)} \right] \\ & = \sum_{Z} P(Z | Y; \theta^{(i)}) \log P(Y, Z; \theta) \end{aligned} \tag {9}

其中,P(Z;Y,θ(i))P(Z; Y, \theta^{(i)})是在給定觀測數據YY和當前的參數估計θ(i)\theta^{(i)}下隱變量數據ZZ的條件概率分佈;

  1. M步:求使Q(θ,θ(i))Q(\theta, \theta^{(i)})極大化的θ\theta,確定第i+1i + 1次迭代的參數的估計值θ(i+1)\theta^{(i + 1)}

θ(i+1)=arg maxθQ(θ,θ(i))(10)\theta^{(i + 1)} = \argmax_{\theta} Q(\theta, \theta^{(i)}) \tag {10}

  1. 重複第2步和第3步,直到收斂。

方程(9)的函數Q(θ,θ(i))Q(\theta, \theta^{(i)})是EM算法的核心,稱爲QQ函數(QQ function)。

定義9.1(Q函數) 完全數據的對數似然函數P(Y,Zθ)P(Y, Z | \theta)是關於給定觀測數據YY和當前參數θ(i)\theta^{(i)}下,對未觀測數據ZZ條件概率分佈P(ZY,θ(i))P(Z | Y, \theta^{(i)})的期望,稱爲QQ函數,即

Q(θ,θ(i))=EZ[logP(Y,Z;θ)Y;θ(i)](11)Q(\theta, \theta^{(i)}) = \text{E}_{Z} \left[ \log P(Y, Z; \theta) | Y; \theta^{(i)} \right] \tag {11}

關於EM算法的幾點說明:

步驟1:參數初值可以任意選擇,但EM算法對初值敏感;

步驟2:E步求Q(θ,θ(i))Q(\theta, \theta^{(i)})QQ函數中ZZ是未觀測數據,YY是觀測數據。Q(θ,θ(i))Q(\theta, \theta^{(i)})的第1個變元表示要極大化的參數,第2個變元表示參數的當前估計值。每次迭代實際在求QQ函數及其極大。

步驟3:M步極大化Q(θ,θ(i))Q(\theta, \theta^{(i)}),得到θ(i+1)\theta^{(i + 1)},完成一次迭代θ(i)θ(i+1)\theta^{(i)} \rightarrow \theta^{(i + 1)}。每次迭代使似然函數增大或達到局部極值。

步驟4:停止迭代的條件一般是對較小的正數ϵ1\epsilon_{1}ϵ2\epsilon_{2},若滿足

θ(i+1)θ(i)<ϵ1\| \theta^{(i + 1)} - \theta^{(i)} \| \lt \epsilon_{1}

Q(θ(i+1),θ(i))Q(θ(i),θ(i))<ϵ2\| Q(\theta^{(i + 1)}, \theta^{(i)}) - Q(\theta^{(i)}, \theta^{(i)}) \| \lt \epsilon_{2}

則停止迭代。

推導

EM算法可通過近似求解觀測數據的對數似然函數極大化問題導出:考慮一個含有隱變量的概率模型,目標是極大化觀測數據(不完全數據)YY關於參數θ\theta的對數似然函數,即極大化

L(θ)=logP(Y;θ)=logZP(Y,Z;θ)=logZP(YZ;θ)P(Z;θ)(12)L(\theta) = \log P(Y; \theta) = \log \sum_{Z} P(Y, Z; \theta) = \log \sum_{Z} P(Y | Z; \theta) P(Z; \theta) \tag {12}

假設在第ii次迭代後θ\theta的估計值爲θ(i)\theta^{(i)}。迭代求解要求θ\theta的新估計值使L(θ)L(\theta)增加,即L(θ)>L(θ(i))L(\theta) \gt L(\theta^{(i)}),並逐步達到極大值。考慮兩者差值:

L(θ)L(θ(i))=logZP(YZ;θ)P(Z;θ)logP(Y;θ(i))\begin{aligned} L(\theta) - L(\theta^{(i)}) = \log \sum_{Z} P(Y | Z; \theta) P(Z; \theta) - \log P(Y; \theta^{(i)}) \end{aligned}

由Jensen不等式(Jensen inequality),其下界爲:

L(θ)L(θ(i))=log(ZP(ZY;θ(i))P(YZ;θ)P(Z;θ)P(ZY;θ(i)))logP(Y;θ(i))ZP(ZY;θ(i))log(P(YZ;θ)P(Z;θ)P(ZY;θ(i)))logP(Y;θ(i))=ZP(ZY;θ(i))log(P(YZ;θ)P(Z;θ)P(ZY;θ(i))P(Y;θ(i)))\begin{aligned} L(\theta) - L(\theta^{(i)}) & = \log \left( \sum_{Z} P(Z | Y; \theta^{(i)}) \frac{P(Y | Z; \theta) P(Z; \theta)}{P(Z | Y; \theta^{(i)})} \right) - \log P(Y; \theta^{(i)}) \\ & \geq \sum_{Z} P(Z | Y; \theta^{(i)})\log \left( \frac{P(Y | Z; \theta) P(Z; \theta)}{P(Z | Y; \theta^{(i)})} \right) - \log P(Y; \theta^{(i)}) \\ & = \sum_{Z} P(Z | Y; \theta^{(i)})\log \left( \frac{P(Y | Z; \theta) P(Z; \theta)}{P(Z | Y; \theta^{(i)}) P(Y; \theta^{(i)})} \right) \end{aligned}

B(θ,θ(i))L(θ(i))+ZP(ZY;θ(i))log(P(YZ;θ)P(Z;θ)P(ZY;θ(i))P(Y;θ(i)))(13)B(\theta, \theta^{(i)}) \triangleq L(\theta^{(i)}) + \sum_{Z} P(Z | Y; \theta^{(i)})\log \left( \frac{P(Y | Z; \theta) P(Z; \theta)}{P(Z | Y; \theta^{(i)}) P(Y; \theta^{(i)})} \right) \tag {13}

L(θ)B(θ,θ(i))(14)L(\theta) \geq B(\theta, \theta^{(i)}) \tag {14}

即函數B(θ,θ(i))B(\theta, \theta^{(i)})L(θ)L(\theta)的一個下界,由方程(13)可知,

L(θ(i))=B(θ(i),θ(i))(15)L(\theta^{(i)}) = B(\theta^{(i)}, \theta^{(i)}) \tag {15}

爲使L(θ)L(\theta)儘可能大的增長,θ(i+1)\theta^{(i + 1)}應選擇

θ(i+1)=arg maxθB(θ,θ(i))(16)\theta^{(i + 1)} = \argmax_{\theta} B(\theta, \theta^{(i)}) \tag {16}

由方程(10)、(13)和(16)可得

θ(i+1)=arg maxθB(θ,θ(i))=arg maxθ(L(θ(i))+ZP(ZY;θ(i))logP(YZ;θ)P(Z;θ)P(ZY;θ(i))P(Y;θ(i)))=arg maxθ(ZP(ZY;θ(i))logP(YZ;θ)P(Z;θ))=arg maxθ(ZP(ZY;θ(i))logP(Y,Z;θ))=arg maxθQ(θ,θ(i))(16)\begin{aligned} \theta^{(i + 1)} & = \argmax_{\theta} B(\theta, \theta^{(i)}) \\ & = \argmax_{\theta} \left( L(\theta^{(i)}) + \sum_{Z} P(Z | Y; \theta^{(i)}) \log \frac{ P(Y | Z; \theta) P(Z; \theta) }{ P(Z | Y; \theta^{(i)}) P(Y; \theta^{(i)}) } \right) \\ & = \argmax_{\theta} \left( \sum_{Z} P(Z | Y; \theta^{(i)}) \log P(Y | Z; \theta) P(Z; \theta) \right) \\ & = \argmax_{\theta} \left( \sum_{Z} P(Z | Y; \theta^{(i)}) \log P(Y, Z; \theta) \right) \\ & = \argmax_{\theta} Q(\theta, \theta^{(i)}) \\ \end{aligned} \tag {16}

EM算法的直觀解釋:圖中上方曲線爲L(θ)L(\theta)、下方曲線爲B(θ,θ(i))B(\theta, \theta^{(i)})B(θ,θ(i))B(\theta, \theta^{(i)})L(θ)L(\theta)的下界,由方程(15),B(θ,θ(i))B(\theta, \theta^{(i)})L(θ)L(\theta)θ=θ(i)\theta = \theta^{(i)}處相等。由方程(16)、(17)可知,EM算法尋找的下一個點θ(i+1)\theta^{(i + 1)}使B(θ,θ(i))B(\theta, \theta^{(i)})(即Q(θ,θ(i))Q(\theta, \theta^{(i)}))極大化。EM算法在θ(i+1)\theta^{(i + 1)}處重新計算函數QQ的值,進行下一輪迭代。在迭代過程中,對數似然函數L(θ)L(\theta)不斷增大,但EM算法不能保證找到全局最優解。

在這裏插入圖片描述

非監督學習中的應用

EM算法可用於生成模型的非監督學習,生成模型由聯合概率分佈P(X,Y)P(X, Y)表示,可以認爲非監督學習訓練數據是聯合概率分佈產生的數據,其中,XX爲觀測數據,YY爲未觀測數據。

2 EM算法的收斂性

EM算法提供一種近似計算含有隱變量概率模型的極大似然估計的方法,其最大優點是簡單性和普適性。

定理9.1P(Y;θ)P(Y; \theta)爲觀測數據的似然函數,θ(i)\theta^{(i)}i=1,2,i = 1, 2, \cdots)爲EM算法得到的參數估計序列,P(Y;θ(i))P(Y; \theta^{(i)})爲對應的似然函數序列,則P(Y;θ(i))P(Y; \theta^{(i)})爲單調遞增的,即

P(Y;θ(i+1))P(Y;θ(i))(18)P(Y; \theta^{(i + 1)}) \geq P(Y; \theta^{(i)}) \tag {18}

證明:

由於

P(Y;θ)=P(Y,Z;θ)P(ZY;θ)P(Y; \theta) = \frac{P(Y, Z; \theta)}{P(Z | Y; \theta)}

取對數有

logP(Y;θ)=logP(Y,Z;θ)logP(ZY;θ)\log P(Y; \theta) = \log P(Y, Z; \theta) - \log P(Z | Y; \theta)

由方程(11)

Q(θ,θ(i))=ZP(ZY;θ(i))logP(Y,Z;θ)Q(\theta, \theta^{(i)}) = \sum_{Z} P(Z | Y; \theta^{(i)}) \log P(Y, Z; \theta)

H(θ,θ(i))=ZP(ZY;θ(i))logP(ZY;θ)(19)H(\theta, \theta^{(i)}) = \sum_{Z} P(Z | Y; \theta^{(i)}) \log P(Z | Y; \theta) \tag {19}

則對數似然函數改寫爲

logP(Y;θ)=Q(θ,θ(i))H(θ,θ(i))(20)\log P(Y; \theta) = Q(\theta, \theta^{(i)}) - H(\theta, \theta^{(i)}) \tag {20}

可知

logP(Y;θ(i+1))logP(Y;θ(i))=[Q(θ(i+1),θ(i))Q(θ(i),θ(i))][H(θ(i+1),θ(i))H(θ(i),θ(i))](21)\begin{aligned} \log P(Y; \theta^{(i + 1)}) - \log P(Y; \theta^{(i)}) & = [Q(\theta^{(i + 1)}, \theta^{(i)}) - Q(\theta^{(i)}, \theta^{(i)})] - [H(\theta^{(i + 1)}, \theta^{(i)}) - H(\theta^{(i)}, \theta^{(i)})] \end{aligned} \tag {21}

因此,需證明方程(21)右端非負。對於第一項,由M步定義可知:

Q(θ(i+1),θ(i))Q(θ(i),θ(i))0(22)Q(\theta^{(i + 1)}, \theta^{(i)}) - Q(\theta^{(i)}, \theta^{(i)}) \geq 0 \tag {22}

對於第二項,由Jensen不等式:

H(θ(i+1),θ(i))H(θ(i),θ(i))=ZP(ZY;θ(i))logP(ZY;θ(i+1))P(ZY;θ(i))log(ZP(ZY;θ(i))P(ZY;θ(i+1))P(ZY;θ(i)))=log(ZP(ZY;θ(i+1)))=0(23)\begin{aligned} H(\theta^{(i + 1)}, \theta^{(i)}) - H(\theta^{(i)}, \theta^{(i)}) & = \sum_{Z} P(Z | Y; \theta^{(i)}) \log \frac{P(Z | Y; \theta^{(i + 1)})}{P(Z | Y; \theta^{(i)})} \\ & \leq \log \left( \sum_{Z} P(Z | Y; \theta^{(i)}) \frac{P(Z | Y; \theta^{(i + 1)})}{P(Z | Y; \theta^{(i)})} \right) \\ & = \log \left( \sum_{Z} P(Z | Y; \theta^{(i + 1)}) \right) \\ & = 0 \end{aligned} \tag {23}

P(Y;θ(i+1))P(Y;θ(i))P(Y; \theta^{(i + 1)}) \geq P(Y; \theta^{(i)})

得證。

定理9.2L(θ)=logP(Y;θ)L(\theta) = \log P(Y; \theta)爲觀測數據的對數似然函數,θ(i)\theta^{(i)}i=1,2,i = 1, 2, \cdots)爲EM算法得到的參數估計序列,L(θ(i))L(\theta^{(i)})爲對應的對數似然函數序列,

  1. 如果P(Y;θ)P(Y; \theta)有上界,則L(θ(i))=logP(Y;θ(i))L(\theta^{(i)}) = \log P(Y; \theta^{(i)})收斂到某一值LL^{\ast}

  2. 在函數Q(θ,θ)Q(\theta, \theta^{\prime})L(θ)L(\theta)滿足一定條件下,由EM算法得到的參數估計序列θ(i)\theta^{(i)}的收斂值θ\theta^{\ast}L(θ)L(\theta)的穩定點。

定理9.2關於函Q(θ,θ)Q(\theta, \theta^{\prime})L(θ)L(\theta)的條件在大多數情況下都是滿足的,EM算法的收斂性包含關於對數似然函數序列L(θ(i))L(\theta^{(i)})的收斂性和關於參數估計序列θ(i)\theta^{(i)}的收斂性,前者並不蘊涵後者。此外,該定理只能保證參數估計序列收斂到對數似然函數序列的穩定點,不能保證收斂到極大值點。所以在應用中,初值的選擇非常重要,常用的辦法是選取幾個不同的初值進行迭代,然後對得到的各個估計值加以比較,擇優選取。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章