我一直在查看RNN的示例文档,并尝试使用微型莎士比亚语料库(输出字符偏移一个字符)来创建我自己的简单RNN用于序列到序列的转换。我使用了@sherjilozair 提供的出色工具 utils.py 来加载数据(https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/utils.py),但我的训练运行结果如下…
加载预处理文件(‘epoch ‘, 0, ‘loss ‘, 930.27938270568848)(‘epoch ‘, 1, ‘loss ‘, 912.94828796386719)(‘epoch ‘, 2, ‘loss ‘, 902.99976110458374)(‘epoch ‘, 3, ‘loss ‘, 902.90720677375793)(‘epoch ‘, 4, ‘loss ‘, 902.87029957771301)(‘epoch ‘, 5, ‘loss ‘, 902.84992623329163)(‘epoch ‘, 6, ‘loss ‘, 902.83739829063416)(‘epoch ‘, 7, ‘loss ‘, 902.82908940315247)(‘epoch ‘, 8, ‘loss ‘, 902.82331037521362)(‘epoch ‘, 9, ‘loss ‘, 902.81916546821594)(‘epoch ‘, 10, ‘loss ‘, 902.81605243682861)(‘epoch ‘, 11, ‘loss ‘, 902.81366014480591)
我原本期待损失会更快下降,但即使在1000个epoch之后,损失仍然大致相同。我认为我的代码可能有问题,但我看不出问题在哪里。我已经在下面粘贴了代码,如果有人能快速浏览一下,看看是否有什么奇怪的地方,我将非常感激,谢谢你。
## rays second predictor## take basic example and convert to rnn#from tensorflow.examples.tutorials.mnist import input_dataimport sysimport argparseimport pdbimport tensorflow as tffrom utils import TextLoaderdef main(_): # break # number of hidden units lstm_size = 24 # embedding of dimensionality 15 should be ok for characters, 300 for words embedding_dimension_size = 15 # load data and get vocab size num_steps = FLAGS.seq_length data_loader = TextLoader(FLAGS.data_dir, FLAGS.batch_size, FLAGS.seq_length) FLAGS.vocab_size = data_loader.vocab_size # placeholder for batches of characters input_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length]) target_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length]) # create cell lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size, state_is_tuple=True) # initialize with zeros initial_state = state = lstm.zero_state(FLAGS.batch_size, tf.float32) # use embedding to convert ints to float array embedding = tf.get_variable("embedding", [FLAGS.vocab_size, embedding_dimension_size]) inputs = tf.nn.embedding_lookup(embedding, input_characters) # flatten back to 2-d because rnn cells only deal with 2d inputs = tf.contrib.layers.flatten(inputs) # get output and (final) state outputs, final_state = lstm(inputs, state) # create softmax layer to classify outputs into characters softmax_w = tf.get_variable("softmax_w", [lstm_size, FLAGS.vocab_size]) softmax_b = tf.get_variable("softmax_b", [FLAGS.vocab_size]) logits = tf.nn.softmax(tf.matmul(outputs, softmax_w) + softmax_b) probs = tf.nn.softmax(logits) # expected labels will be 1-hot representation of last character of target_characters last_characters = target_characters[:,-1] last_one_hot = tf.one_hot(last_characters, FLAGS.vocab_size) # calculate loss cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=last_one_hot, logits=logits) # calculate total loss as mean across all batches batch_loss = tf.reduce_mean(cross_entropy) # train using adam optimizer train_step = tf.train.AdagradOptimizer(0.3).minimize(batch_loss) # start session sess = tf.InteractiveSession() # initialize variables sess.run(tf.global_variables_initializer()) # train! num_epochs = 1000 # loop through epocs for e in range(num_epochs): # look through batches numpy_state = sess.run(initial_state) total_loss = 0.0 data_loader.reset_batch_pointer() for i in range(data_loader.num_batches): this_batch = data_loader.next_batch() # Initialize the LSTM state from the previous iteration. numpy_state, current_loss, _ = sess.run([final_state, batch_loss, train_step], feed_dict={initial_state:numpy_state, input_characters:this_batch[0], target_characters:this_batch[1]}) total_loss += current_loss # output total loss print("epoch ", e, "loss ", total_loss) # break into debug pdb.set_trace() # calculate accuracy using training setif __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare', help='Directory for storing input data') parser.add_argument('--batch_size', type=int, default=100, help='minibatch size') parser.add_argument('--seq_length', type=int, default=50, help='RNN sequence length') FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
更新于7月20日。
感谢您的回复。我已经更新了代码,使用动态RNN调用,代码如下…
outputs, final_state = tf.nn.dynamic_rnn(initial_state=initial_state, cell=lstm, inputs=inputs, dtype=tf.float32)
这引发了一些有趣的问题…批处理似乎通过数据集每次选择50个字符的块,然后向前移动50个字符以获取批次中的下一个序列。如果这些用于训练,并且您基于预测的序列中最后一个字符与最后一个字符+1计算损失,那么在每个序列中,有49个字符的预测从未被测试过损失。这似乎有点奇怪。
此外,在测试输出时,我输入一个字符而不是50个字符,然后获取预测并将该单个字符反馈进去。我应该在每一步都增加这个单个字符吗?所以第一个种子是1个字符,然后我添加预测的字符,因此下一次调用是2个字符的序列,等等,直到达到我的训练序列长度的最大值?还是如果我传入更新的状态,这并不重要?即,更新的状态是否也代表了所有前面的字符?
在另一个方面,我发现我认为是损失未减少的主要原因…我误用了两次softmax…
logits = tf.nn.softmax(tf.matmul(final_output, softmax_w) + softmax_b)probs = tf.nn.softmax(logits)
回答:
你的函数lstm()
只是一个单元,而不是一系列单元。对于序列,你需要创建一系列lstms
,然后将序列作为输入。通过连接嵌入输入并通过单个单元传递是行不通的,相反,你应该使用dynamic_rnn
方法来处理序列。
另外,softmax
被应用了两次,在logits
和cross_entropy
中都需要修复这个问题。