我正在使用TensorFlow 教程中的代码训练我的条件GAN网络,该代码使用了从头编写的训练循环
def fit(train_ds, epochs, test_ds): for epoch in range(epochs): start = time.time() display.clear_output(wait=True) for example_input, example_target in test_ds.take(1): generate_images(generator, example_input, example_target) print("Epoch: ", epoch) # 训练 for n, (input_image, target) in train_ds.enumerate(): print('.', end='') if (n+1) % 100 == 0: print() train_step(input_image, target, epoch) print() # 每20个epoch保存一次模型(检查点) if (epoch + 1) % 20 == 0: checkpoint.save(file_prefix = checkpoint_prefix) print ('第{}个epoch所花时间为{}秒\n'.format(epoch + 1, time.time()-start)) checkpoint.save(file_prefix = checkpoint_prefix)
训练步骤定义如下
@tf.functiondef train_step(input_image, target, epoch): with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape: gen_output = generator(input_image, training=True) disc_real_output = discriminator([input_image, target], training=True) disc_generated_output = discriminator([input_image, gen_output], training=True) gen_total_loss, gen_gan_loss, gen_l1_loss = generator_loss(disc_generated_output, gen_output, target) disc_loss = discriminator_loss(disc_real_output, disc_generated_output) generator_gradients = gen_tape.gradient(gen_total_loss, generator.trainable_variables) discriminator_gradients = disc_tape.gradient(disc_loss, discriminator.trainable_variables) generator_optimizer.apply_gradients(zip(generator_gradients, generator.trainable_variables)) discriminator_optimizer.apply_gradients(zip(discriminator_gradients, discriminator.trainable_variables)) with summary_writer.as_default(): tf.summary.scalar('gen_total_loss', gen_total_loss, step=epoch) tf.summary.scalar('gen_gan_loss', gen_gan_loss, step=epoch) tf.summary.scalar('gen_l1_loss', gen_l1_loss, step=epoch) tf.summary.scalar('disc_loss', disc_loss, step=epoch)
现在我的问题是,对于summary writer,它是只保存批次的损失还是保存整个数据集的平均损失?如果是批次的损失,它保存的是哪个批次的损失?如果批次大小不一致,如何获取整个数据集的平均损失?我原以为它是平均值,因为我从TensorFlow教程中得到了这个代码,所以我信任它,但仔细想想,我并不确定这是不是真的。
回答:
如果你希望Tensorboard仅获取每个epoch的损失,你需要在每个epoch结束时将值保存到Tensorboard,而不是在每个批次结束时。
首先,为每个epoch创建平均值变量:
for epoch in range(epochs): mean_epoch_loss = tf.metrics.Mean() # 等...
然后在train_step
中,使用相应的损失更新这个值:
@tf.functiondef train_step(input_image, target, epoch): # 等... mean_epoch_loss.update_state(epoch_loss)
最后,在每个epoch结束时,将这个值保存到Tensorboard:
for epoch in range(epochs): # 等... with summary_writer.as_default(): tf.summary.scalar('mean_epoch_loss', mean_epoch_loss, step=epoch)