我开发了一个代码,用于使用20个新闻组数据集进行在线多类分类。为了消除输入到LSTM中的文本填充0的影响,我在dynamic_rnn中添加了‘sequence_length’参数,传递每个正在处理的文本的长度。
添加此属性后,预测(如下所示的代码)在所有迭代中都给出了相同的预测,除了第一次。
predictions = tf.nn.softmax(logit).eval(feed_dict=feed)
下面显示的是我在第一次、第二次、第三次和第四次迭代中接收到的预测:
第一次: [[0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05]]
第二次: [[0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.0509586 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956]]
第三次: [[0.0498649 0.0498649 0.0498649 0.05072384 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.05170782 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 ]]
第四次: [[0.04974937 0.04974937 0.04974937 0.05137746 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.05234195 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.05054148 0.04974937 0.04974937]]
在第二次迭代之后,预测不再变化(预测的argmax始终为10)。
问题:我在这里做错了什么?提前谢谢!
下面是我的完整代码:
from collections import Counterimport tensorflow as tffrom sklearn.datasets import fetch_20newsgroupsimport matplotlib as mpltmplt.use('agg') # Must be before importing matplotlib.pyplot or pylab!import matplotlib.pyplot as pltfrom string import punctuationfrom sklearn.preprocessing import LabelBinarizerimport numpy as npfrom nltk.corpus import stopwordsimport nltknltk.download('stopwords')def pre_process(): newsgroups_data = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes')) words = [] temp_post_text = [] print(len(newsgroups_data.data)) for post in newsgroups_data.data: all_text = ''.join([text for text in post if text not in punctuation]) all_text = all_text.split('\n') all_text = ''.join(all_text) temp_text = all_text.split(" ") for word in temp_text: if word.isalpha(): temp_text[temp_text.index(word)] = word.lower() # temp_text = [word for word in temp_text if word not in stopwords.words('english')] temp_text = list(filter(None, temp_text)) temp_text = ' '.join([i for i in temp_text if not i.isdigit()]) words += temp_text.split(" ") temp_post_text.append(temp_text) # temp_post_text = list(filter(None, temp_post_text)) dictionary = Counter(words) # deleting spaces # del dictionary[""] sorted_split_words = sorted(dictionary, key=dictionary.get, reverse=True) vocab_to_int = {c: i for i, c in enumerate(sorted_split_words,1)} message_ints = [] for message in temp_post_text: temp_message = message.split(" ") message_ints.append([vocab_to_int[i] for i in temp_message]) # maximum message length = 6577 # message_lens = Counter([len(x) for x in message_ints])AAA seq_length = 6577 num_messages = len(temp_post_text) features = np.zeros([num_messages, seq_length], dtype=int) for i, row in enumerate(message_ints): # print(features[i, -len(row):]) # features[i, -len(row):] = np.array(row)[:seq_length] features[i, :len(row)] = np.array(row)[:seq_length] # print(features[i]) lb = LabelBinarizer() lbl = newsgroups_data.target labels = np.reshape(lbl, [-1]) labels = lb.fit_transform(labels) sequence_lengths = [len(msg) for msg in message_ints] return features, labels, len(sorted_split_words)+1, sequence_lengthsdef get_batches(x, y, sql, batch_size=1): for ii in range(0, len(y), batch_size): yield x[ii:ii + batch_size], y[ii:ii + batch_size], sql[ii:ii+batch_size]def plot(noOfWrongPred, dataPoints): font_size = 14 fig = plt.figure(dpi=100,figsize=(10, 6)) mplt.rcParams.update({'font.size': font_size}) plt.title("Distribution of wrong predictions", fontsize=font_size) plt.ylabel('Error rate', fontsize=font_size) plt.xlabel('Number of data points', fontsize=font_size) plt.plot(dataPoints, noOfWrongPred, label='Prediction', color='blue', linewidth=1.8) # plt.legend(loc='upper right', fontsize=14) plt.savefig('distribution of wrong predictions.png') # plt.show()def train_test(): features, labels, n_words, sequence_length = pre_process() print(features.shape) print(labels.shape) # Defining Hyperparameters lstm_layers = 1 batch_size = 1 lstm_size = 200 learning_rate = 0.01 # --------------placeholders------------------------------------- # Create the graph object graph = tf.Graph() # Add nodes to the graph with graph.as_default(): tf.set_random_seed(1) inputs_ = tf.placeholder(tf.int32, [None, None], name="inputs") # labels_ = tf.placeholder(dtype= tf.int32) labels_ = tf.placeholder(tf.float32, [None, None], name="labels") sql_in = tf.placeholder(tf.int32, [None], name= 'sql_in') # output_keep_prob is the dropout added to the RNN's outputs, the dropout will have no effect on the calculation of the subsequent states. keep_prob = tf.placeholder(tf.float32, name="keep_prob") # Size of the embedding vectors (number of units in the embedding layer) embed_size = 300 # generating random values from a uniform distribution (minval included and maxval excluded) embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1),trainable=True) embed = tf.nn.embedding_lookup(embedding, inputs_) print(embedding.shape) print(embed.shape) print(embed[0]) # Your basic LSTM cell lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size) # Add dropout to the cell drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob) # Stack up multiple LSTM layers, for deep learning cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers) # Getting an initial state of all zeros initial_state = cell.zero_state(batch_size, tf.float32) outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state, sequence_length=sql_in) # hidden layer hidden = tf.layers.dense(outputs[:, -1], units=25, activation=tf.nn.relu) print(hidden.shape) logit = tf.contrib.layers.fully_connected(hidden, num_outputs=20, activation_fn=None) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logit, labels=labels_)) optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost) saver = tf.train.Saver() # ----------------------------online training----------------------------------------- with tf.Session(graph=graph) as sess: tf.set_random_seed(1) sess.run(tf.global_variables_initializer()) iteration = 1 state = sess.run(initial_state) wrongPred = 0 noOfWrongPreds = [] dataPoints = [] for ii, (x, y, sql) in enumerate(get_batches(features, labels, sequence_length, batch_size), 1): feed = {inputs_: x, labels_: y, sql_in : sql, keep_prob: 0.5, initial_state: state} predictions = tf.nn.softmax(logit).eval(feed_dict=feed) print("----------------------------------------------------------") print("sez: ",sql) print("Iteration: {}".format(iteration)) isequal = np.equal(np.argmax(predictions[0], 0), np.argmax(y[0], 0)) print(np.argmax(predictions[0], 0)) print(np.argmax(y[0], 0)) if not (isequal): wrongPred += 1 print("nummber of wrong preds: ",wrongPred) if iteration%50 == 0: noOfWrongPreds.append(wrongPred/iteration) dataPoints.append(iteration) loss, states, _ = sess.run([cost, final_state, optimizer], feed_dict=feed) print("Train loss: {:.3f}".format(loss)) iteration += 1 saver.save(sess, "checkpoints/sentiment.ckpt") errorRate = wrongPred / len(labels) print("ERRORS: ", wrongPred) print("ERROR RATE: ", errorRate) plot(noOfWrongPreds, dataPoints)if __name__ == '__main__': train_test()
回答:
看起来你的模型没有学习到任何东西,只是在进行随机猜测。我提供了以下几点建议(然而,这些可能不是随机猜测的确切原因),
- 掩蔽成本函数:
如这里所解释的:https://danijar.com/variable-sequence-lengths-in-tensorflow/,在计算损失时,仅考虑实际的序列长度,而不使用填充的序列长度进行平均是一个好的做法。
以下解释是从上述来源中提取的:
请注意,我们的输出仍然是batch_size x max_length x out_size的大小,但对于短于最大长度的序列,最后一个将是零向量。当你在序列标记中使用每个时间步的输出时,我们不希望在成本函数中考虑它们。我们掩蔽未使用的帧,并通过实际长度计算序列长度上的平均误差。使用tf.reduce_mean()在这里不起作用,因为它会除以最大序列长度。
- 堆叠多个单元:
以下代码片段堆叠了相同的LSTM单元副本,而不是不同的实例,
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
更多详细解释可以在这里找到:无法使用MultiRNNCell和dynamic_rnn堆叠LSTM
- 批量大小:
你使用了批量大小=1,这是随机梯度下降法的方法。因此,尝试增加你的批量大小(使用小批量梯度下降法),这将减少噪声并具有更快的收敛特性。
- 尝试几个周期,看看损失和准确度是如何变化的:
这将帮助你更好地理解你的模型的行为。
希望这些建议对你有帮助。