Keras踩坑:
1.設置earlystopping
filepath = model_snapshot_directory + '/' + 'lstm_model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
model.fit(X_train,y_train,epochs=100,batch_size=128,
verbose=1,callbacks=[checkpoint],validation_data=(X_test,y_test)
checkpoint設置的監控值是monitor=val_loss,當val_loss值不發生很大的改善就不保存模型。
2.使用hyperas
best_run,best_model=optim.minimize(model=train,data=prepare_data,algo=tpe.suggest, max_evals=100,trials=Trials())
在這裏max_eval=100表示在訓練過程中要對不同的組合評估100次,每一次的模型參數都不一樣。這個只可以根據實際參數的多少來設置,越大可能訓練的模型就越多。
3.模型的評估
best_model.evaluate(X_test,y_test)
這個evaluate 的返回值是一個元組(score,acc),loss值=-score
4.model.fit的返回值
hist=model.fit(X_train, y_train, epochs=100, batch_size={{choice([64, 128, 256])}}, verbose=1,
callbacks=callback_list, validation_data=(X_test, y_test))
h1=hist.history
acc_=np.asarray(h1['acc'])
loss_=np.asarray((h1['loss']))
val_acc=np.asarray(h1['val_acc'])
val_loss=np.asarray(h1['val_loss'])
acc_and_loss=np.column_stack((acc_,loss_,val_acc,val_loss))
save_file_mlp = model_snapshot_directory+'/mlp_run_' + '_' + str(globalvars.globalVar) + '.txt'
with open(save_file_mlp, 'w') as f:
np.savetxt(save_file_mlp, acc_and_loss, delimiter=" ")
fit()函數返回一個名爲history的變量,其中包含損失追蹤以及在編譯模型時指定的任何其他指標,這些分數都記錄在每個訓練輪數的末尾。
可以使用Matplotlib庫繪製模型的性能圖,
from matplotlib import pyplot
pyplot.plot(history.history['loss'])
pyplot.plot(history.history['val_loss'])
pyplot.title('model train vs validation loss')
pyplot.ylabel('loss')
pyplot.xlabel('epoch')
pyplot.legend(['train', 'validation'], loc='upper right')
pyplot.show()
5.診斷LSTM網絡模型的過擬合和欠擬合
https://baijiahao.baidu.com/s?id=1577431637601070077&wfr=spider&for=pc
6.使用多GPU跑模型,並保存模型
新建一個py文件,內容如下:
from keras.callbacks import ModelCheckpoint
class AltModelCheckpoint(ModelCheckpoint):
def __init__(self, filepath, alternate_model, **kwargs):
"""
Additional keyword args are passed to ModelCheckpoint; see those docs for information on what args are accepted.
:param filepath:
:param alternate_model: Keras model to save instead of the default. This is used especially when training multi-
gpu models built with Keras multi_gpu_model(). In that case, you would pass the original
"template model" to be saved each checkpoint.
:param kwargs: Passed to ModelCheckpoint.
"""
self.alternate_model = alternate_model
super().__init__(filepath, **kwargs)
def on_epoch_end(self, epoch, logs=None):
model_before = self.model
self.model = self.alternate_model
super().on_epoch_end(epoch, logs)
self.model = model_before
然後在訓練的文件中:
from alt_model_checkpoint import AltModelCheckpoint
from keras.models import Model
from keras.utils import multi_gpu_model
base_model = Model(...)
gpu_model = multi_gpu_model(base_model,numbers_of_gpu)
gpu_model.compile(...)
gpu_model.fit(..., callbacks=[
AltModelCheckpoint('save/path/for/model.hdf5', base_model)
])
如果要加上earlystopping,則修改fit 的內容,比如:
hist = gpu_model.fit(X_train, y_train, batch_size={{choice([64, 128, 256])}}, epochs=100, verbose=1,
callbacks=[AltModelCheckpoint(filepath, model, monitor='val_loss', verbose=1, save_best_only=True, mode='min')],
validation_data=(X_test, y_test))
因爲AltModelCheckpoint是繼承自ModelCheckpoint,所以可以直接添加。
這個是使用多GPU的例子
-
主要調用了multi_gpu_model這個函數
-
在訓練的時候,保存檢查點模型,自定義了一個函數,保存的是base_model而不是gpu_model。
在模型保存之後,load的時候:
- 用load_model load保存的模型文件
-
需要gpu_model=multi_gpu_model(base_model)
gpu_model.complie()
3.做預測gpu_model.predict()
https://github.com/keras-team/keras/issues/9342
在keras的 saving.py文件中,
添加這個:
# ... earlier get_json_type code
# NOTE: Hacky fix to serialize Dimension objects.
from tensorflow.python.framework.tensor_shape import Dimension
if type(obj) == Dimension:
return int(obj)
# original error raised here
raise TypeError('Not JSON Serializable:', obj)
Keras調用多GPU的例子: https://www.jianshu.com/p/d57595dac5a9
多GPU+earlystopping+hyperas進行調參
# in the function train
def train():
...
# first distribute GPUs according to the gpu which you possess
gpu_model = multi_gpu_model(model, gpus=2)
gpu_model.compile(loss=loss_fn, optimizer=optim, metrics=['accuracy'])
# set earlystopping using ModelCheckPoint
filepath='...'
early_stopping = EarlyStopping(monitor='val_loss', patience=20, mode='min')
checkpointer = ModelCheckpoint(filepath=filepath, monitor='val_loss', verbose=1,
save_best_only=True, save_weights_only=True, mode='min')
callback_list = [early_stopping, checkpointer]
# train the model
hist = gpu_model.fit(X_train, y_train, epochs=100, batch_size=128,
verbose=1, callbacks=callback_list, validation_data=(X_test, y_test))
gpu_model.load_weights(filepath)
score, acc = gpu_model.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', acc)
return {'loss': -acc, 'status': STATUS_OK, 'model': gpu_model}
保存的時候使用gpu_model,並且在return的時候也是gpu_model
注意:使用多GPU訓練的模型需要依舊使用多GPU來做預測,比如在節點g-1-4上訓練,就仍要使用g-1-4predict,並且使用相同的GPU數量。
注:
1.在使用多GPU並且設置ModelCheckpoint的時候,不能設置save_model_only=True,只有設置save_weights_only=True的時候,才能夠正常訓練模型,並且代碼運行正常結束。
2.使用多GPU+Hyperas+ModelCheckpoint時,因爲按照第一條只能保存權重,而且hyperas的優化得到的best_model不能進行正常報錯,會報錯,can not pickle the module。並且即使能夠保存,在進行預測的時候也要重構模型,並且要使用訓練時相同的GPU數,但是卻無法得知最優的是哪一個權重。因此就無法進行預測。
解決方法是:
def train():
...
# first distribute GPUs according to the gpu which you possess
gpu_model = multi_gpu_model(model, gpus=2)
gpu_model.compile(loss=loss_fn, optimizer=optim, metrics=['accuracy'])
# set earlystopping using ModelCheckPoint
filepath='...'
early_stopping = EarlyStopping(monitor='val_loss', patience=20, mode='min')
checkpointer = ModelCheckpoint(filepath=filepath, monitor='val_loss', verbose=1,
save_best_only=True, save_weights_only=True, mode='min')
callback_list = [early_stopping, checkpointer]
# train the model
hist = gpu_model.fit(X_train, y_train, epochs=100, batch_size=128,
verbose=1, callbacks=callback_list, validation_data=(X_test, y_test))
score, acc = gpu_model.evaluate(X_test, y_test, verbose=0)
model.save(filepath) # 這一段一定要放在gpu_model.evaluate下面,否則會出錯
print('Test accuracy:', acc)
return {'loss': -acc, 'status': STATUS_OK, 'model': model}
注意保存的時候是model,返回的模型是model
雖然沒有compile model但是在fit結束後,model的權重就是gpu_model的權重。
經過測試,是可以得到best_model的。並且可以在單GPU上進行預測。
7.keras模型的多輸入(此處使用了多GPU以及hyperas調參工具)
input_embed = Input(shape=(700,), name='input_embed')
input_extra = Input(shape=(700, 25,), name='input_extra')
embedded = Embedding(num_amino_acids, 50, input_length=700)(input_embed)
x = concatenate([embedded, input_extra], axis=2)
......
x = BatchNormalization()(x)
output = Activation(activation='sigmoid')(x)
model = Model(inputs=[input_embed, input_extra], outputs=output)
gpu_model = multi_gpu_model(model, 4)
gpu_model.compile(...)
callback_list = [early_stopping, checkpointer]
hist = gpu_model.fit(x={'input_embed': X_all, 'input_extra': X_extra},
y=y_all,
epochs=100, batch_size=256,
verbose=1, callbacks=callback_list,
class_weight=class_weights, validation_split=0.2) 雖然設置了validation_split但是在訓練的時候只會在每一個epoch驗證,每一個batch沒有驗證,而且,對於多輸入來說,不能對validation_data裏面添加多輸入
注意:此處多次踩坑,特別重要,下面的代碼中的X_train,X_extra,y_train很重要,必須得在prepare_data()函數中是這個名稱,否則會報錯名稱不存在
def prepare_data():
......
return (X_train,X_extra,y_train)
if __name__ == "__main__":
best_run,best_model=optim.minimize(model=train,data=prepare_data,algo=tpe.suggest,
max_evals=30,trials=Trials())
X_train,X_extra,y_train=prepare_data()
8.設置earlystoppping的mo
nitor監控自定義的值,比如auc值。
方法一:經過試驗,當使用hyperas進行調參時,會報錯,說沒有auc_roc
from tensorflow.contrib.metrics import streaming_auc
def auc_roc(y_true,y_pred):
value,update_op=streaming_auc(y_pred,y_true)
# find all variables created for this metric
metric_vars=[i for i in tf.local_variables() if 'auc_roc' in i.name.split('/')[1]]
# Add metric variables to GLOBAL_VARIABLES collection.
# They will be initialized for new session.
for v in metric_vars:
tf.add_to_collection(tf.GraphKeys.GLOBAL_VARIABLES, v)
# force to update metric values
with tf.control_dependencies([update_op]):
value = tf.identity(value)
return value
def train():
......
gpu_model.compile(loss=loss_fn, optimizer=adam, metrics=['accuracy',auc_roc])
early_stopping = EarlyStopping(monitor='val_auc_roc', patience=20, mode='max')
checkpointer = ModelCheckpoint(filepath=filepath, monitor='val_auc_roc', verbose=1,
save_best_only=True,save_weights_only=True, mode='max')