对于一个 NLP 任务,我的输入数据集被转换成如下形式:整数的列表的列表。特征和标签是相同的数据集。
>>>training_data = [[ 0 4 79 3179 11 44 8 1 11245 173 152 10 1 1138 1079] [ 0 0 4 79 3179 11 44 8 11566 173 152 8 1 1138 1079] [ 0 0 0 0 0 0 0 9 15 333 44 3 61 63 533] [ 0 0 0 0 0 0 3 19 253 28 44 3 61 63 533] [ 0 0 0 0 0 0 0 0 0 0 0 2 3 49 4395] [ 0 0 0 0 0 0 0 0 0 0 0 0 75 65 4395] [ 3 1 7128 3388 289 10 446 200 675 8 3320 14 32 82 234] [ 7 74 268 577 23 49 31 5 1032 98 10 4270 5026 12 6570] [ 0 0 0 0 0 0 0 2 3 39 7 27 155 29 4534] [ 0 0 0 0 0 2 3 19 39 7 27 155 29 34 4534]]
验证数据集是从主数据集中摘取的,格式相同。
然后我调用了 fit()
方法 – 我的模型是 vae
n_steps = (800000 / 2) / batch_size for counter in range(nb_epoch): print('-------epoch: ',counter,'--------') vae.fit(x=np.array(training_data),y=np.array(training_data), steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))
这会引发以下错误
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
我还尝试了
vae.fit(x=training_data,y=training_data, steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))
同样出现了相同的错误。
欢迎任何关于如何格式化数据以进行训练的良好解决方案或提示,使用列表、np.arrays 或生成器。
编辑:一些代码
training_data = pad_sequences(sequences, maxlen = MAX_SEQUENCE_LENGTH)len_val = int(np.floor ( len(texts) * 0.2 )) # num samples for validationdata_1_val = data_1[-len_val:] #select len_val sentences as validation data
构建和训练模型
x = Input(batch_shape=(None, max_len))x_embed = Embedding(NB_WORDS, emb_dim, weights=[glove_embedding_matrix], input_length=max_len, trainable=False)(x)
[…]
loss_layer = CustomVariationalLayer()([x, x_decoded_mean])vae = Model(x, [loss_layer])opt = Adam(lr=0.01) #SGD(lr=1e-2, decay=1e-6, momentum=0.9, nesterov=True)vae.compile(optimizer='adam', loss=[zero_loss])nb_epoch = 100n_steps = (800000 / 2) / batch_size for counter in range(nb_epoch): print('-------epoch: ',counter,'--------') vae.fit(training_data,training_data, steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))
在原始的 GitHub 代码中 使用了一个生成器作为 fit()
的输入,使用了 Keras 中已弃用的方法 fit_generator
for counter in range(nb_epoch): print('-------epoch: ',counter,'--------') vae.fit_generator(sent_generator(TRAIN_DATA_FILE, batch_size/2), steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))
由于 fit()
也支持生成器参数,我首先尝试了
for counter in range(nb_epoch): print('-------epoch: ',counter,'--------') vae.fit(sent_generator(TRAIN_DATA_FILE, batch_size/2), steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))
这会崩溃,出现与上述相同的错误。
回答: