GAN 未能收敛。鉴别器损失持续增加

我正在使用 MNIST 数据集构建一个简单的生成对抗网络。

这是我的实现：

import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npfrom tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("MNIST_data/",one_hot=True)def noise(batch_size):    return np.random.uniform(-1, 1, (batch_size, 100))learning_rate = 0.001batch_size = 128input = tf.placeholder('float', [None, 100])real_data = tf.placeholder('float', [None, 784])def generator(x):    weights = {        'hl1' : tf.Variable(tf.random_normal([100, 200])),        'ol'  : tf.Variable(tf.random_normal([200, 784]))    }    biases = {        'hl1' : tf.Variable(tf.random_normal([200])),        'ol'  : tf.Variable(tf.random_normal([784]))    }    hl1 = tf.add(tf.matmul(x, weights['hl1']), biases['hl1'])    ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']), biases['ol']))    return oldef discriminator(x):    weights = {        'hl1' : tf.Variable(tf.random_normal([784, 200])),        'ol'  : tf.Variable(tf.random_normal([200, 1]))    }    biases = {        'hl1' : tf.Variable(tf.random_normal([200])),        'ol'  : tf.Variable(tf.random_normal([1]))    }    hl1 = tf.add(tf.matmul(x, weights['hl1']), biases['hl1'])    ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']), biases['ol']))    return olwith tf.variable_scope("G"):    G = generator(input)with tf.variable_scope("D"):    D_real = discriminator(real_data)with tf.variable_scope("D", reuse = True):    D_gen = discriminator(G)generator_parameters = [x for x in tf.trainable_variables() if x.name.startswith('G/')]discriminator_parameters = [x for x in tf.trainable_variables() if x.name.startswith('D/')]G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_gen, labels=tf.ones_like(D_gen)))D_real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_real, labels=tf.ones_like(D_real)))D_fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_gen, labels=tf.zeros_like(D_gen)))D_total_loss = tf.add(D_fake_loss, D_real_loss)G_train = tf.train.AdamOptimizer(learning_rate).minimize(G_loss,var_list=generator_parameters)D_train = tf.train.AdamOptimizer(learning_rate).minimize(D_total_loss,var_list=discriminator_parameters)sess = tf.Session()init = tf.global_variables_initializer()sess.run(init)loss_g_function = []loss_d_function = []for epoch in range(200):    for iteratiion in range(int(len(mnist.train.images)/batch_size)):        real_batch, _ = mnist.train.next_batch(batch_size)        _, d_err = sess.run([D_train, D_total_loss], feed_dict = {real_data : real_batch, input : noise(batch_size)})        _, g_err = sess.run([G_train, G_loss], feed_dict = {input : noise(batch_size)})    print("Epoch = ", epoch)    print("D_loss = ", d_err)    print("G_loss = ", g_err)    loss_g_function.append(g_err)    loss_d_function.append(d_err)# Visualizingimport matplotlib.pyplot as plttest_noise = noise(1)plt.subplot(2, 2, 1)plt.plot(test_noise[0])plt.title("Noise")plt.subplot(2, 2, 2)plt.imshow(np.reshape(sess.run(G, feed_dict = {input : test_noise})[0], [28, 28]))plt.title("Generated Image")plt.subplot(2, 2, 3)plt.plot(loss_d_function, 'r')plt.xlabel("Epochs")plt.ylabel("Discriminator Loss")plt.title("D-Loss")plt.subplot(2, 2, 4)plt.plot(loss_g_function, 'b')plt.xlabel("Epochs")plt.ylabel("Generator Loss")plt.title("G_Loss")plt.show()

我已经尝试了 lr = 0.001 lr = 0.0001 和 lr = 0.00003。

这些是我的结果： https://i.sstatic.net/NXA0H.jpg

可能的原因是什么？我的权重初始化是从正态分布中随机抽取的。另外，请检查损失函数，它们是否正确？

回答：

问题：

仅有一层网络：

hl1 = tf.add(tf.matmul(x, weights['hl1']), biases['hl1'])    ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']), biases['ol']))

上述定义的用于鉴别器和生成器的网络没有为第一层定义激活函数。这意味着网络实际上只有一层：y = act(w2(x*w1+b1)+b2) = act(x*w+b)

两次应用Sigmoid：

ol = tf.nn.sigmoid(tf.add(tf.matmul(hl1, weights['ol']) ...D_real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(...)

如评论中所提到的，激活函数被应用了两次。

权重初始化：

tf.Variable(tf.random_normal([784, 200]))

在使用Sigmoid激活函数的情况下，如果权重较大，梯度会很小，这意味着权重实际上不会发生变化。（较大的w + 非常小的delta(w)）。这可能是为什么运行上述代码时，损失似乎变化不大的原因。最好采用行业最佳实践，使用像xavier_initializer()这样的初始化方法。

动态范围不一致：生成器的输入动态范围是[-1, 1]，它被乘以[-1, 1]范围内的权重，但输出到[0, 1]范围内。这本身没有问题，偏置可以学习映射输出范围。但最好使用输出[-1, 1]的激活层，如tanh，这样网络可以更快地学习。如果生成器使用tanh作为激活函数，那么输入到鉴别器的图像需要缩放到[-1, 1]以保持训练的一致性。

通过上述更改，你可以得到类似于以下结果：

上述网络非常简单，输出质量不高。我故意没有改变复杂性，以了解从简单网络中可以得到什么样的输出。

你可以构建更大的网络（包括CNN），并尝试使用最新的GAN模型来获得更好的结果质量。

重现上述结果的代码可以从这里获得。

学技术

GAN 未能收敛。鉴别器损失持续增加

问题：

发表回复取消回复

问题：

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复