我使用ModelCheckpoint回调函数来保存我正在训练的模型的最佳epoch。保存时没有错误,但当我尝试加载时,出现了以下错误:
2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
我尝试使用绝对路径/完整路径,但没有成功。我确定可以使用EarlyStopping,但我想了解为什么会出现这个错误。以下是我的代码:
from __future__ import absolute_import, division, print_functionimport tensorflow as tffrom tensorflow import kerasimport numpy as npimport matplotlib.pyplot as pltimport datetimeimport statistics(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)train_images = train_images / 255test_images = test_images / 255train_labels = list(map(float, train_labels))test_labels = list(map(float, test_labels))train_labels = [i/10 for i in train_labels]test_labels = [i/10 for i in test_labels]'''model = keras.Sequential([ keras.layers.Flatten(input_shape=(128, 128)), keras.layers.Dense(64, activation=tf.nn.relu), keras.layers.Dense(1) ])'''start_time = datetime.datetime.now()model = keras.Sequential([ keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)), keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)), keras.layers.Dropout(0.2), keras.layers.Conv2D(64, (5, 5), activation='relu'), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Dropout(0.2), keras.layers.Flatten(), keras.layers.Dropout(0.5), keras.layers.Dense(1000, activation='relu'), keras.layers.Dense(1)])model.compile(loss='mean_absolute_error', optimizer=keras.optimizers.SGD(lr=0.01), metrics=['mean_absolute_error', 'mean_squared_error'])train_images = train_images.reshape(328, 128, 128, 1)test_images = test_images.reshape(82, 128, 128, 1)model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])model.load_weights("cp.ckpt")predictions = model.predict(test_images)totalDifference = 0for i in range(82): print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10)) totalDifference += abs(test_labels[i] - predictions[i])avgDifference = totalDifference / 8.2print("\n%s\n" % avgDifference)print("Time Elapsed:")print(datetime.datetime.now() - start_time)
回答:
简而言之,你在保存整个模型,但尝试加载的只是权重,这样是不行的。
解释
你的模型的fit
方法如下:
model.fit( train_images, train_labels, epochs=100, callbacks=[ keras.callbacks.ModelCheckpoint( "cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1 ) ],)
由于ModelCheckpoint
中的save_weights=False
是默认设置,你正在将整个模型保存到.ckpt
文件中。
顺便说一下,文件应该命名为.hdf5
或.hf5
,因为这是Hierarchical Data Format 5
。由于Windows不是对扩展名无感知的操作系统,如果tensorflow
/keras
依赖于扩展名,你可能会遇到一些问题。
另一方面,你只是在加载模型的权重,而文件中包含了整个模型:
model.load_weights("cp.ckpt")
Tensorflow的检查点机制(.cp
)与Keras的(.hdf5
)不同,所以要注意这一点(有计划将它们更紧密地整合在一起,参见这里和这里)。
解决方案
因此,要么继续使用当前的回调函数,但是使用model.load("model.hdf5")
,要么在ModelCheckpoint
中添加save_weights_only=True
参数:
model.fit( train_images, train_labels, epochs=100, callbacks=[ keras.callbacks.ModelCheckpoint( "weights.hdf5", monitor="mean_absolute_error", save_best_only=True, verbose=1, save_weights_only=True, # 指定此参数 ) ],)
然后你可以使用你的model.load_weights("weights.hdf5")
。