Theano權重子集更新

      新入門Theano,官方文檔最後一節講到Theano中部分權重(權重子集)更新問題“How to update a subset of weights?”,按照教程自己寫了一個實例,但是f = theano.function(…, updates=updates)報錯。Bug信息提示,updates = inc_subtensor(subset, g*lr)得到的是一個tensor類型,而updates要求的輸入類型是二元元組:The updates parameter must be an OrderedDict/dict or a list of lists/tuples with 2 elements。

      經過多次修改嘗試但未能解決,最終還是在萬能的Stack Overflow上找到了inc_subtensor用於updates的方法。自行在Logistic Regression算法上做了一點嘗試,驗證方法可行:

import theano
import theano.tensor as T
import numpy

x = theano.shared(numpy.random.rand(100,50).astype(theano.config.floatX), name = 'x')
w = theano.shared(numpy.random.rand(50,1).astype(theano.config.floatX), name = 'w')
y = theano.shared(numpy.random.randint(size=(100,1), low=0, high=2).astype(theano.config.floatX), name = 'y')

part = 40
wsubset = w[0:part,:]
wrest = w[part:,:]
print "w before train:"
print w.get_value().T

p_1 = 1 / (1 + T.exp(-T.dot(x[:,0:part], wsubset) - T.dot(x[:,part:], wrest)))  # 1/(1+exp(-w'x))
predict = p_1 > 0.5

crossEnt = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # cross entropy function
cost = crossEnt.mean() + 0.01 * (wsubset ** 2).sum() # cost function with regularization term
#cost = T.sum((y - 1/(1 + T.exp(-T.dot(x[:,0:part], wsubset) - T.dot(x[:,part:], wrest))))**2)

wgrad = T.grad(cost, wsubset) # gradient w.r.t. wsubset
update = (w, T.inc_subtensor(wsubset, -0.01 * wgrad)) # update a subset of the weight

trainFn = theano.function(inputs=[], outputs=[predict], updates=[update])
predFn = theano.function(inputs =[], outputs =[predict] ) # predict function

for i in range(5000):
    trainFn()

print "w after train:"
print w.get_value().T
print "y is :"
print y.get_value().T
print "predict is:"
print (predFn()[0] * 1.0).T

      在上面這段代碼中,梯度下降算法只迭代更新權重w的前40個elements,剩餘的10個elements保持不變,另外1/(1+exp(-w’x + b))的偏置項b也省略了。是一個很naive的例子,沒有實際的意義,但是作爲一個練習還是沒問題的。。。

主要代碼:需要提前對權重矩陣進行拆分,grad和update都是由需要更新的權重子集wsubset來構建graph圖的。

w = theano.shared(numpy.random.rand(50,1).astype(theano.config.floatX), name = 'w')
wsubset = w[0:part,:]
wrest = w[part:,:]

p_1 = 1 / (1 + T.exp(-T.dot(x[:,0:part], wsubset) - T.dot(x[:,part:], wrest)))  #
cost = crossEnt.mean() + 0.01 * (wsubset ** 2).sum() 
wgrad = T.grad(cost, wsubset) # gradient w.r.t. wsubset
update = (w, T.inc_subtensor(wsubset, -0.01 * wgrad))

trainFn = theano.function(inputs=[], outputs=[predict], updates=[update])

實驗結果:可以看到w矩陣在訓練前後最後10個elements是保持不變的,這就達到了我們的目的。

w before train:
[[ 0.34600419  0.67398912  0.09942167  0.65765017  0.44213673  0.06654485
   0.39846805  0.3888059   0.83535087  0.87614214  0.1428479   0.69523871
   0.59748024  0.89421201  0.16198015  0.90665674  0.66680759  0.29132733
   0.97294956  0.34204745  0.28578022  0.005306    0.82625932  0.36869088
   0.61629105  0.58408296  0.54571205  0.83845872  0.38558939  0.66588008
   0.70807606  0.58614755  0.44821101  0.11765263  0.6195485   0.81328052
   0.74707526  0.84718859  0.10713185  0.16338864  0.39414939  0.39094746
   0.97880673  0.35624492  0.13801318  0.93115759  0.97082269  0.14509809
   0.96431786  0.16936433]]
w after train:
[[-0.09856597 -0.3590492   0.13191809 -0.39871272 -0.1845082  -0.20230994
   0.11841918  0.38574994 -1.26536679 -0.1392861  -0.00187506  0.12097881
  -0.14895041 -0.35272926  0.4578245  -0.85317516  0.09256358 -0.19773743
  -0.07583583 -0.21877731 -0.84497571 -0.63426024 -0.44498774  0.03201531
   0.00287166  0.03242523 -0.92445505 -0.12279754 -0.08953576  0.38422242
   0.29207328 -0.12609322  0.27217883  0.21954003  0.18007286  0.0674418
  -0.76156878 -0.10139606  0.04785168 -0.46169016  0.39414939  0.39094746
   0.97880673  0.35624492  0.13801318  0.93115759  0.97082269  0.14509809
   0.96431786  0.16936433]]
y is :
[[ 0.  1.  1.  1.  0.  1.  0.  1.  0.  1.  0.  1.  0.  0.  1.  0.  1.  0.
   0.  0.  1.  0.  1.  0.  1.  1.  0.  0.  1.  0.  0.  1.  1.  0.  0.  1.
   1.  0.  1.  1.  0.  1.  1.  1.  1.  0.  1.  0.  0.  0.  1.  1.  1.  1.
   0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  1.  1.  1.  1.  1.  1.  0.  1.
   1.  0.  0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  1.
   0.  0.  0.  0.  1.  1.  0.  1.  0.  0.]]
predict is:
[[ 0.  0.  1.  1.  1.  0.  0.  1.  0.  1.  1.  1.  0.  1.  1.  0.  1.  0.
   0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.
   1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  0.  0.  1.  1.  1.  0.
   1.  1.  0.  0.  1.  0.  0.  0.  1.  0.  1.  1.  1.  0.  1.  1.  0.  1.
   1.  0.  0.  0.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.
   0.  0.  0.  0.  0.  0.  0.  1.  0.  1.]]

最後,這個LR模型的擬合能力怎麼這麼弱。。。別擔心,代碼沒錯,把w的有效維數(代碼中的part變量)增加到400試試~

考資料:
Theano教程:http://deeplearning.net/software/theano/tutorial/faq_tutorial.html

Stack Overflow:http://stackoverflow.com/questions/15917849/how-can-i-assign-update-subset-of-tensor-shared-variable-in-theano

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章