keras踩坑(hyperas+多GPU)

Keras踩坑:

1.設置earlystopping

filepath = model_snapshot_directory + '/' + 'lstm_model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
model.fit(X_train,y_train,epochs=100,batch_size=128,
          verbose=1,callbacks=[checkpoint],validation_data=(X_test,y_test)

checkpoint設置的監控值是monitor=val_loss,當val_loss值不發生很大的改善就不保存模型。

2.使用hyperas

best_run,best_model=optim.minimize(model=train,data=prepare_data,algo=tpe.suggest, max_evals=100,trials=Trials())

在這裏max_eval=100表示在訓練過程中要對不同的組合評估100次,每一次的模型參數都不一樣。這個只可以根據實際參數的多少來設置,越大可能訓練的模型就越多。

3.模型的評估

best_model.evaluate(X_test,y_test)

這個evaluate 的返回值是一個元組(score,acc),loss值=-score

4.model.fit的返回值

hist=model.fit(X_train, y_train, epochs=100, batch_size={{choice([64, 128, 256])}}, verbose=1,
                             callbacks=callback_list, validation_data=(X_test, y_test))
h1=hist.history
acc_=np.asarray(h1['acc'])
loss_=np.asarray((h1['loss']))
val_acc=np.asarray(h1['val_acc'])
val_loss=np.asarray(h1['val_loss'])
acc_and_loss=np.column_stack((acc_,loss_,val_acc,val_loss))
save_file_mlp = model_snapshot_directory+'/mlp_run_' + '_' + str(globalvars.globalVar) + '.txt'
with open(save_file_mlp, 'w') as f:
    np.savetxt(save_file_mlp, acc_and_loss, delimiter=" ")

 

fit()函數返回一個名爲history的變量,其中包含損失追蹤以及在編譯模型時指定的任何其他指標,這些分數都記錄在每個訓練輪數的末尾。

可以使用Matplotlib庫繪製模型的性能圖,

from matplotlib import pyplot
pyplot.plot(history.history['loss'])
pyplot.plot(history.history['val_loss'])
pyplot.title('model train vs validation loss')
pyplot.ylabel('loss')
pyplot.xlabel('epoch')
pyplot.legend(['train', 'validation'], loc='upper right')
pyplot.show()

 

5.診斷LSTM網絡模型的過擬合和欠擬合

https://baijiahao.baidu.com/s?id=1577431637601070077&wfr=spider&for=pc

6.使用多GPU跑模型,並保存模型

新建一個py文件,內容如下:

from keras.callbacks import ModelCheckpoint
class AltModelCheckpoint(ModelCheckpoint):
    def __init__(self, filepath, alternate_model, **kwargs):
        """
        Additional keyword args are passed to ModelCheckpoint; see those docs for information on what args are accepted.
        :param filepath:
        :param alternate_model: Keras model to save instead of the default. This is used especially when training multi-
                                gpu models built with Keras multi_gpu_model(). In that case, you would pass the original
                                "template model" to be saved each checkpoint.
        :param kwargs:          Passed to ModelCheckpoint.
        """

        self.alternate_model = alternate_model
        super().__init__(filepath, **kwargs)

    def on_epoch_end(self, epoch, logs=None):
        model_before = self.model
        self.model = self.alternate_model
        super().on_epoch_end(epoch, logs)
        self.model = model_before

然後在訓練的文件中:

from alt_model_checkpoint import AltModelCheckpoint
from keras.models import Model
from keras.utils import multi_gpu_model
base_model = Model(...)
gpu_model = multi_gpu_model(base_model,numbers_of_gpu)
gpu_model.compile(...)
gpu_model.fit(..., callbacks=[
    AltModelCheckpoint('save/path/for/model.hdf5', base_model)
])

 

如果要加上earlystopping,則修改fit 的內容,比如:

hist = gpu_model.fit(X_train, y_train, batch_size={{choice([64, 128, 256])}}, epochs=100, verbose=1, 
                callbacks=[AltModelCheckpoint(filepath, model, monitor='val_loss', verbose=1, save_best_only=True, mode='min')],
                validation_data=(X_test, y_test))

 

因爲AltModelCheckpoint是繼承自ModelCheckpoint,所以可以直接添加。

這個是使用多GPU的例子

  1. 主要調用了multi_gpu_model這個函數

  2. 在訓練的時候,保存檢查點模型,自定義了一個函數,保存的是base_model而不是gpu_model。

在模型保存之後,load的時候:

  • 用load_model load保存的模型文件
  • 需要gpu_model=multi_gpu_model(base_model)

              gpu_model.complie()

  3.做預測gpu_model.predict()

https://github.com/keras-team/keras/issues/9342

在keras的 saving.py文件中,

添加這個:

# ... earlier get_json_type code
# NOTE: Hacky fix to serialize Dimension objects.
from tensorflow.python.framework.tensor_shape import Dimension
if type(obj) == Dimension:
  return int(obj)
# original error raised here
raise TypeError('Not JSON Serializable:', obj)

Keras調用多GPU的例子: https://www.jianshu.com/p/d57595dac5a9

多GPU+earlystopping+hyperas進行調參

# in the function train
def train():
        ...
        # first distribute GPUs according to the gpu which you possess
    gpu_model = multi_gpu_model(model, gpus=2)
    gpu_model.compile(loss=loss_fn, optimizer=optim, metrics=['accuracy'])
    # set earlystopping using ModelCheckPoint
    filepath='...'
    early_stopping = EarlyStopping(monitor='val_loss', patience=20, mode='min')
    checkpointer = ModelCheckpoint(filepath=filepath, monitor='val_loss', verbose=1,
                                   save_best_only=True, save_weights_only=True, mode='min')
    callback_list = [early_stopping, checkpointer]
    # train the model
    hist = gpu_model.fit(X_train, y_train, epochs=100, batch_size=128,
                         verbose=1, callbacks=callback_list, validation_data=(X_test, y_test))
    gpu_model.load_weights(filepath)
    score, acc = gpu_model.evaluate(X_test, y_test, verbose=0)
    print('Test accuracy:', acc)
    return {'loss': -acc, 'status': STATUS_OK, 'model': gpu_model}

保存的時候使用gpu_model,並且在return的時候也是gpu_model

注意:使用多GPU訓練的模型需要依舊使用多GPU來做預測,比如在節點g-1-4上訓練,就仍要使用g-1-4predict,並且使用相同的GPU數量。

1.在使用多GPU並且設置ModelCheckpoint的時候,不能設置save_model_only=True,只有設置save_weights_only=True的時候,才能夠正常訓練模型,並且代碼運行正常結束。

2.使用多GPU+Hyperas+ModelCheckpoint時,因爲按照第一條只能保存權重,而且hyperas的優化得到的best_model不能進行正常報錯,會報錯,can not pickle the module。並且即使能夠保存,在進行預測的時候也要重構模型,並且要使用訓練時相同的GPU數,但是卻無法得知最優的是哪一個權重。因此就無法進行預測。

解決方法是:

def train():
        ...
        # first distribute GPUs according to the gpu which you possess
    gpu_model = multi_gpu_model(model, gpus=2)
    gpu_model.compile(loss=loss_fn, optimizer=optim, metrics=['accuracy'])
    # set earlystopping using ModelCheckPoint
    filepath='...'
    early_stopping = EarlyStopping(monitor='val_loss', patience=20, mode='min')
    checkpointer = ModelCheckpoint(filepath=filepath, monitor='val_loss', verbose=1,
                                   save_best_only=True, save_weights_only=True, mode='min')
    callback_list = [early_stopping, checkpointer]
    # train the model
    hist = gpu_model.fit(X_train, y_train, epochs=100, batch_size=128,
                         verbose=1, callbacks=callback_list, validation_data=(X_test, y_test))
    score, acc = gpu_model.evaluate(X_test, y_test, verbose=0)
    model.save(filepath)    # 這一段一定要放在gpu_model.evaluate下面,否則會出錯
    print('Test accuracy:', acc)
    return {'loss': -acc, 'status': STATUS_OK, 'model': model}

注意保存的時候是model,返回的模型是model

雖然沒有compile model但是在fit結束後,model的權重就是gpu_model的權重。   

 經過測試,是可以得到best_model的。並且可以在單GPU上進行預測。

7.keras模型的多輸入(此處使用了多GPU以及hyperas調參工具)

input_embed = Input(shape=(700,), name='input_embed')
    input_extra = Input(shape=(700, 25,), name='input_extra')
    embedded = Embedding(num_amino_acids, 50, input_length=700)(input_embed)
    x = concatenate([embedded, input_extra], axis=2)
    ......
    x = BatchNormalization()(x)
    output = Activation(activation='sigmoid')(x)
    model = Model(inputs=[input_embed, input_extra], outputs=output)

    gpu_model = multi_gpu_model(model, 4)
    gpu_model.compile(...)
    callback_list = [early_stopping, checkpointer]
    hist = gpu_model.fit(x={'input_embed': X_all, 'input_extra': X_extra},
                         y=y_all,
                         epochs=100, batch_size=256,
                         verbose=1, callbacks=callback_list,
                         class_weight=class_weights, validation_split=0.2)  雖然設置了validation_split但是在訓練的時候只會在每一個epoch驗證,每一個batch沒有驗證,而且,對於多輸入來說,不能對validation_data裏面添加多輸入

注意:此處多次踩坑,特別重要,下面的代碼中的X_train,X_extra,y_train很重要,必須得在prepare_data()函數中是這個名稱,否則會報錯名稱不存在

def prepare_data():
    ......
    return (X_train,X_extra,y_train)

if __name__ == "__main__":
    best_run,best_model=optim.minimize(model=train,data=prepare_data,algo=tpe.suggest,
                                       max_evals=30,trials=Trials())
    X_train,X_extra,y_train=prepare_data()

8.設置earlystoppping的mo

nitor監控自定義的值,比如auc值。

方法一:經過試驗,當使用hyperas進行調參時,會報錯,說沒有auc_roc

from tensorflow.contrib.metrics import streaming_auc
def auc_roc(y_true,y_pred):
    value,update_op=streaming_auc(y_pred,y_true)
    # find all variables created for this metric
    metric_vars=[i for i in tf.local_variables() if 'auc_roc' in i.name.split('/')[1]]

    # Add metric variables to GLOBAL_VARIABLES collection.
    # They will be initialized for new session.
    for v in metric_vars:
        tf.add_to_collection(tf.GraphKeys.GLOBAL_VARIABLES, v)

    # force to update metric values
    with tf.control_dependencies([update_op]):
        value = tf.identity(value)
        return value

def train():
        ......
        gpu_model.compile(loss=loss_fn, optimizer=adam, metrics=['accuracy',auc_roc])
    early_stopping = EarlyStopping(monitor='val_auc_roc', patience=20, mode='max')
    checkpointer = ModelCheckpoint(filepath=filepath, monitor='val_auc_roc', verbose=1,
                               save_best_only=True,save_weights_only=True, mode='max')

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章