我在Tensorflow中尝试对MNIST数据集实现变分自编码器(VAE)。首先,我训练了一个基于多层感知机(MLP)的编码器和解码器的VAE。训练过程非常顺利,损失函数逐渐降低,并且生成的数字看起来合理。这里是基于MLP的VAE解码器的代码:
x = sampled_zx = tf.layers.dense(x, 200, tf.nn.relu)x = tf.layers.dense(x, 200, tf.nn.relu)x = tf.layers.dense(x, np.prod(data_shape))img = tf.reshape(x, [-1] + data_shape)
接下来,我决定添加卷积层。仅更改编码器的效果很好,但当我在解码器中使用反卷积层(而不是全连接层)时,完全无法进行训练。损失函数始终不下降,输出始终是黑色的。以下是反卷积解码器的代码:
x = tf.layers.dense(sampled_z, 24, tf.nn.relu)x = tf.layers.dense(x, 7 * 7 * 64, tf.nn.relu)x = tf.reshape(x, [-1, 7, 7, 64])x = tf.layers.conv2d_transpose(x, 64, 3, 2, 'SAME', activation=tf.nn.relu)x = tf.layers.conv2d_transpose(x, 32, 3, 2, 'SAME', activation=tf.nn.relu)x = tf.layers.conv2d_transpose(x, 1, 3, 1, 'SAME', activation=tf.nn.sigmoid)img = tf.reshape(x, [-1, 28, 28])
这看起来很奇怪,代码在我看来没有任何问题。我确定问题出在解码器的反卷积层中,肯定是那里出了什么问题。例如,如果我在最后一个反卷积层之后添加一个全连接层(即使没有非线性激活函数!),它又能工作了!这是代码:
x = tf.layers.dense(sampled_z, 24, tf.nn.relu)x = tf.layers.dense(x, 7 * 7 * 64, tf.nn.relu)x = tf.reshape(x, [-1, 7, 7, 64])x = tf.layers.conv2d_transpose(x, 64, 3, 2, 'SAME', activation=tf.nn.relu)x = tf.layers.conv2d_transpose(x, 32, 3, 2, 'SAME', activation=tf.nn.relu)x = tf.layers.conv2d_transpose(x, 1, 3, 1, 'SAME', activation=tf.nn.sigmoid)x = tf.contrib.layers.flatten(x)x = tf.layers.dense(x, 28 * 28)img = tf.reshape(x, [-1, 28, 28])
我现在真的有点卡住了,有人知道这里可能发生了什么吗?我使用的是tf 1.8.0,Adam优化器,学习率为1e-4。
编辑:
正如@Agost指出的,我应该进一步澄清我的损失函数和训练过程。我将后验分布建模为伯努利分布,并将最大化ELBO作为我的损失函数。受到这篇文章的启发。这是编码器、解码器和损失函数的完整代码:
def make_prior(): mu = tf.zeros(N_LATENT) sigma = tf.ones(N_LATENT) return tf.contrib.distributions.MultivariateNormalDiag(mu, sigma)def make_encoder(x_input): x_input = tf.reshape(x_input, shape=[-1, 28, 28, 1]) x = conv(x_input, 32, 3, 2) x = conv(x, 64, 3, 2) x = conv(x, 128, 3, 2) x = tf.contrib.layers.flatten(x) mu = dense(x, N_LATENT) sigma = dense(x, N_LATENT, activation=tf.nn.softplus) # softplus是log(exp(x) + 1) return tf.contrib.distributions.MultivariateNormalDiag(mu, sigma) def make_decoder(sampled_z): x = tf.layers.dense(sampled_z, 24, tf.nn.relu) x = tf.layers.dense(x, 7 * 7 * 64, tf.nn.relu) x = tf.reshape(x, [-1, 7, 7, 64]) x = tf.layers.conv2d_transpose(x, 64, 3, 2, 'SAME', activation=tf.nn.relu) x = tf.layers.conv2d_transpose(x, 32, 3, 2, 'SAME', activation=tf.nn.relu) x = tf.layers.conv2d_transpose(x, 1, 3, 1, 'SAME') img = tf.reshape(x, [-1, 28, 28]) img_distribution = tf.contrib.distributions.Bernoulli(img) img = img_distribution.probs img_distribution = tf.contrib.distributions.Independent(img_distribution, 2) return img, img_distributiondef main(): mnist = input_data.read_data_sets(os.path.join(experiment_dir(EXPERIMENT), 'MNIST_data')) tf.reset_default_graph() batch_size = 128 x_input = tf.placeholder(dtype=tf.float32, shape=[None, 28, 28], name='X') prior = make_prior() posterior = make_encoder(x_input) mu, sigma = posterior.mean(), posterior.stddev() z = posterior.sample() generated_img, output_distribution = make_decoder(z) likelihood = output_distribution.log_prob(x_input) divergence = tf.distributions.kl_divergence(posterior, prior) elbo = tf.reduce_mean(likelihood - divergence) loss = -elbo global_step = tf.train.get_or_create_global_step() optimizer = tf.train.AdamOptimizer(1e-3).minimize(loss, global_step=global_step)
回答:
可能是你在最后一个反卷积层中使用了sigmoid函数,将输出限制在0到1之间,而你在基于MLP的自编码器中或在反卷积层后添加全连接层时没有这样做,所以可能存在数据范围问题?