我正在尝试训练一个用于语言翻译的seq2seq
模型,并且我在 Google Colab 上从这个Kaggle 笔记本中复制粘贴代码。代码在使用 CPU 和 GPU 时运行正常,但在 TPU 上训练时出现错误。同样的问题已经在这里被问过。
这是我的代码:
strategy = tf.distribute.experimental.TPUStrategy(resolver) with strategy.scope(): model = create_model() model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy') model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size), steps_per_epoch = train_samples // batch_size, epochs = epochs, validation_data = generate_batch(X_test, y_test, batch_size = batch_size), validation_steps = val_samples // batch_size)
错误追踪:
Epoch 1/2---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-60-940fe0ee3c8b> in <module>() 3 epochs = epochs, 4 validation_data = generate_batch(X_test, y_test, batch_size = batch_size),----> 5 validation_steps = val_samples // batch_size)10 frames/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs) 992 except Exception as e: # pylint:disable=broad-except 993 if hasattr(e, "ag_error_metadata"):--> 994 raise e.ag_error_metadata.to_exception(e) 995 else: 996 raiseValueError: in user code: /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function * return step_function(self, iterator) /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,))...ValueError: None values not supported.
我无法找出错误的原因,我认为错误是由于这个generate_batch
函数引起的:
X, y = lines['english_sentence'], lines['hindi_sentence']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)def generate_batch(X = X_train, y = y_train, batch_size = 128): while True: for j in range(0, len(X), batch_size): encoder_input_data = np.zeros((batch_size, max_length_src), dtype='float32') decoder_input_data = np.zeros((batch_size, max_length_tar), dtype='float32') decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens), dtype='float32') for i, (input_text, target_text) in enumerate(zip(X[j:j + batch_size], y[j:j + batch_size])): for t, word in enumerate(input_text.split()): encoder_input_data[i, t] = input_token_index[word] for t, word in enumerate(target_text.split()): if t<len(target_text.split())-1: decoder_input_data[i, t] = target_token_index[word] if t>0: decoder_target_data[i, t - 1, target_token_index[word]] = 1. yield([encoder_input_data, decoder_input_data], decoder_target_data)
我的 Colab 笔记本 – 这里
Kaggle 数据集 – 这里
TensorFlow 版本 – 2.6
编辑 – 请不要告诉我降级 TensorFlow/Keras 版本到1.x
。我可以降级到TensorFlow 2.0, 2.1, 2.3
,但不是1.x
。我不理解TensorFlow 1.x
。此外,使用三年前的版本也没有意义。
回答:
正如你提供的链接中引用的答案所述,tensorflow.data
API 与 TPU 的兼容性更好。为了适应你的情况,尝试在generate_batch
函数中使用return
代替yield
:
def generate_batch(X = X_train, y = y_train, batch_size = 128): ... return encoder_input_data, decoder_input_data, decoder_target_datencoder_input_data, decoder_input_data, decoder_target_data = generate_batch(X_train, y_train, batch_size=128)
然后使用tensorflow.data
来构建你的数据:
from tensorflow.data import Datasetencoder_input_data = Dataset.from_tensor_slices(encoder_input_data)decoder_input_data = Dataset.from_tensor_slices(decoder_input_data)decoder_target_data = Dataset.from_tensor_slices(decoder_target_data)ds = Dataset.zip((encoder_input_data, decoder_input_data, decoder_target_data)).map(map_fn).batch(1024)
其中map_fn
定义为:
def map_fn(encoder_input ,decoder_input, decoder_target): return (encoder_input ,decoder_input), decoder_target
最后使用Model.fit
代替Model.fit_generator
:
model.fit(x=ds, epochs=epochs)