第10章 無監督學習(2)

Continue


Representational Power, Layer Size and Depth


大多數自編碼器都只有一層所謂的隱藏層,也就是所謂的碼

  • 單層已能夠在給定精度表達任何函數 e.g. Principal Components Analysis(PCA)主元分析

  • 一個多層自編碼器更難訓練, 但是如果訓練適當,可以獲得更牛叉的表達效果.1

隨機自編碼器

xQ(h|x)hP(x|h)output

隨機自編碼器的一般結構。編碼和解碼都不是簡單的函數,都引入一些噪聲,意味着他的輸出可以看過是輸入的分佈取樣, Q(h|x) for the encoder and P(x|h) for the decoder. RBMs 是一個特殊情況,其中P=Q

線性特徵模型

關於數據生成的假設

  • sample real-valued factors
    hP(h)
  • sample the real-valued observable variables
    x=Wh+b+mnoise

概率主元分析和特徵分析

它們都是上面等式的特殊情況,不過他們先驗和噪聲分佈的選擇不同

hP(h)P(x|h)x=Wh+b+noise

線性特徵模型的一般結構, 觀測數據 x 是通過隱性因素h 的線性組合加上一些噪聲獲得的

  • 特徵分析2, 先驗:
    hN(0,I)

    其中假設 xi 條件獨立,噪聲來自對角協方差的高斯分佈,協方差矩陣
    ψ=mdiag(σ2)
    其中
    σ2=(σ21,σ22,...)

    h 的角色是獲取 xi 的相互依賴。
    xN(b,WWT+σ2I)
    其中 xi 通過 wik (for every k )影響 h^k=Wkx ,反過來, h^k 通過 wkj 影響 xj
  • 爲了將PCA放進概率框架,令條件方差 σi 互相相當? . 不懂
    這時
    xN(b,WWT+σ2I)

    也就是
    x=Wh+b+σz

    其中 zN(0,I) 是白噪聲.

概率 PCA

方差被h 獲取, 使得 h成爲一個小的重構殘差 σ2 .

  • σ0 , pPCA 成爲 PCA.

Continue


Representational Power, Layer Size and Depth


Generally, most trained auto-encoders have had a single hidden layer which is also the representation layer or code.

  • approximator abilities of single hiddenlayer neural networks: a sufficiently large hidden layer can represent any function with a given accuracy e.g. Principal Components Analysis(PCA)

  • training a deep neural network, and in particular a deep auto-encoder (i.e. with a deep encoder and a deep decoder) is more difficult than training a shallow one. If trained properly, such deep auto-encoders could yield much better compression than corresponding shallow or linear auto-encoders.3

stochastic auto-encoder

xQ(h|x)hP(x|h)output

Basic scheme of a stochastic auto-encoder. Both the encoder and the decoder are not simple functions but instead involve some noise injection, meaning that their output can be seen as sampled from a distribution, Q(h|x) for the encoder and P (x|h) for the decoder. RBMs are a special case where P = Q but in general these two distributions are not necessarily conditional distributions compatible with a unique joint distribution P (x, h).

Liner factor Models

Assumpation of how data was generated

  • sample real-valued factors
    hP(h)
  • sample the real-valued observable variables
    x=Wh+b+mnoise

Probabilistic PCA and Factor Analysis

Both are special cases of above equations and only differ in the choices made for the prior and noise distributions.

hP(h)P(x|h)x=Wh+b+noise

Basic scheme of a linear factors model, in which it is assumed that an observed data vector x is obtained by a linear combination of latent factors h, plus some noise. Different models, such as probabilistic PCA, factor analysis or ICA, make different choices about the form of the noise and of the prior P(h).

  • factor analysis4, latent variable prior:
    hN(0,I)

    where xi are assumed to be conditionally independent and noise is assumed to be coming from a fiagonal covariance Gaussian distribution, with covariance matrix
    ψ=mdiag(σ2)

    where
    σ2=(σ21,σ22,...)

    The role of the latent variables is to capture the dependence among xi .
    xN(b,WWT+σ2I)

    where
    xi influences h^k=Wkx via wik (for every k ) and h^k influencess xj via wkj
  • In order to cast PCA in a probabilistic framework make the conditional variances σi equal to each other? .
    In that case
    xN(b,WWT+σ2I)

    also
    x=Wh+b+σz

    where zN(0,I) is white nosie.

Probabilistic PCA

the covariance is mostlycaptured by the latent variables , up h to some small residual reconstruction error σ2 .

  • if σ0 , pPCA becomes PCA.

  1. Bartholomew, 1987; Basilevsky, 1994
  2. Bartholomew, 1987; Basilevsky, 1994
  3. Bartholomew, 1987; Basilevsky, 1994
  4. Bartholomew, 1987; Basilevsky, 1994
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章