经过长时间的反复尝试,我终于成功保存了我的模型(参见我的问题 TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists)))。但现在我在加载保存的模型时遇到了问题。首先,在加载模型时我得到了以下错误:
ValueError: You are trying to load a weight file containing 1 layers into a model with 0 layers.
在将顺序模型改为函数式API后,我得到了以下错误:
ValueError: Cannot assign to variable dense_features/NAME1W1_embedding/embedding_weights:0 due to variable shape (101, 15) and value shape (57218, 15) are incompatible
我尝试了不同版本的TensorFlow。在tf-nightly版本中得到了上述错误。在2.1版本中,我得到了一个非常相似的错误:
ValueError: Shapes (101, 15) and (57218, 15) are incompatible.
在2.2和2.3版本中,我甚至无法保存模型(如我在之前的问题中描述的)。
这是函数式API的相关代码:
def __loadModel(args): filepath = args.loadModel model = tf.keras.models.load_model(filepath) print("开始预处理...") (_, _, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.batchSize) print("预处理完成") _, accuracy = model.evaluate(test_ds) print("准确率", accuracy)def __trainModel(args): (train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.batchSize) for bucketSizeGEO in args.bucketSizeGEO: print("开始预处理...") feature_columns = preprocessing.getFutureColumns(args.data, args.zip, bucketSizeGEO, True) #Todo: 比较trainable=False和trainable=True feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False) print("预处理完成") feature_layer_inputs = preprocessing.getFeatureLayerInputs() feature_layer_outputs = feature_layer(feature_layer_inputs) output_layer = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(feature_layer_outputs) model = tf.keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=output_layer) model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy']) paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO) log_dir = "logs\\logR\\" + paramString + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) model.fit(train_ds, validation_data=val_ds, epochs=args.epoch, callbacks=[tensorboard_callback]) model.summary() loss, accuracy = model.evaluate(test_ds) print("准确率", accuracy) paramString = paramString + "-a{:.4f}".format(accuracy) outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramString if args.saveModel: for i, w in enumerate(model.weights): print(i, w.name) path = './saved_models/' + outputName + '.h5' model.save(path, save_format='h5')
关于相关的预处理部分,请参见本问题开头提到的那个问题。 for i, w in enumerate(model.weights): print(i, w.name)
返回以下内容:
0 dense_features/NAME1W1_embedding/embedding_weights:01 dense_features/NAME1W2_embedding/embedding_weights:02 dense_features/STREETW_embedding/embedding_weights:03 dense_features/ZIP_embedding/embedding_weights:04 dense/kernel:05 dense/bias:0
回答:
我已经解决了我的一个相当愚蠢的错误:
我在使用feature_column库来预处理我的数据。不幸的是,我在函数categorical_column_with_identity的参数num_buckets中指定了一个固定的值,而不是词汇表的实际大小。错误版本:
street_voc = tf.feature_column.categorical_column_with_identity( key='STREETW', num_buckets=100)
正确版本:
street_voc = tf.feature_column.categorical_column_with_identity( key='STREETW', num_buckets= __getNumberOfWords(data, 'STREETPRO') + 1)
函数 __getNumberOfWords(data, 'STREETPRO')
返回pandas数据框中列 ‘STREETPRO’ 中不同单词的数量。