使用LSTM在在线多类分类中每次迭代预测相同值

我开发了一个代码,用于使用20个新闻组数据集进行在线多类分类。为了消除输入到LSTM中的文本填充0的影响,我在dynamic_rnn中添加了‘sequence_length’参数,传递每个正在处理的文本的长度。

添加此属性后,预测(如下所示的代码)在所有迭代中都给出了相同的预测除了第一次

predictions = tf.nn.softmax(logit).eval(feed_dict=feed)

下面显示的是我在第一次、第二次、第三次和第四次迭代中接收到的预测:

第一次: [[0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05]]

第二次: [[0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.0509586 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956]]

第三次: [[0.0498649 0.0498649 0.0498649 0.05072384 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.05170782 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 ]]

第四次: [[0.04974937 0.04974937 0.04974937 0.05137746 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.05234195 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.05054148 0.04974937 0.04974937]]

在第二次迭代之后,预测不再变化(预测的argmax始终为10)。

问题:我在这里做错了什么?提前谢谢!

下面是我的完整代码:

from collections import Counterimport tensorflow as tffrom sklearn.datasets import fetch_20newsgroupsimport matplotlib as mpltmplt.use('agg') # Must be before importing matplotlib.pyplot or pylab!import matplotlib.pyplot as pltfrom string import punctuationfrom sklearn.preprocessing import LabelBinarizerimport numpy as npfrom nltk.corpus import stopwordsimport nltknltk.download('stopwords')def pre_process():    newsgroups_data = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))    words = []    temp_post_text = []    print(len(newsgroups_data.data))    for post in newsgroups_data.data:        all_text = ''.join([text for text in post if text not in punctuation])        all_text = all_text.split('\n')        all_text = ''.join(all_text)        temp_text = all_text.split(" ")        for word in temp_text:            if word.isalpha():                temp_text[temp_text.index(word)] = word.lower()        # temp_text = [word for word in temp_text if word not in stopwords.words('english')]        temp_text = list(filter(None, temp_text))        temp_text = ' '.join([i for i in temp_text if not i.isdigit()])        words += temp_text.split(" ")        temp_post_text.append(temp_text)    # temp_post_text = list(filter(None, temp_post_text))    dictionary = Counter(words)    # deleting spaces    # del dictionary[""]    sorted_split_words = sorted(dictionary, key=dictionary.get, reverse=True)    vocab_to_int = {c: i for i, c in enumerate(sorted_split_words,1)}    message_ints = []    for message in temp_post_text:        temp_message = message.split(" ")        message_ints.append([vocab_to_int[i] for i in temp_message])    # maximum message length = 6577    # message_lens = Counter([len(x) for x in message_ints])AAA    seq_length = 6577    num_messages = len(temp_post_text)    features = np.zeros([num_messages, seq_length], dtype=int)    for i, row in enumerate(message_ints):        # print(features[i, -len(row):])        # features[i, -len(row):] = np.array(row)[:seq_length]        features[i, :len(row)] = np.array(row)[:seq_length]        # print(features[i])    lb = LabelBinarizer()    lbl = newsgroups_data.target    labels = np.reshape(lbl, [-1])    labels = lb.fit_transform(labels)    sequence_lengths = [len(msg) for msg in message_ints]    return features, labels, len(sorted_split_words)+1, sequence_lengthsdef get_batches(x, y, sql, batch_size=1):    for ii in range(0, len(y), batch_size):        yield x[ii:ii + batch_size], y[ii:ii + batch_size], sql[ii:ii+batch_size]def plot(noOfWrongPred, dataPoints):    font_size = 14    fig = plt.figure(dpi=100,figsize=(10, 6))    mplt.rcParams.update({'font.size': font_size})    plt.title("Distribution of wrong predictions", fontsize=font_size)    plt.ylabel('Error rate', fontsize=font_size)    plt.xlabel('Number of data points', fontsize=font_size)    plt.plot(dataPoints, noOfWrongPred, label='Prediction', color='blue', linewidth=1.8)    # plt.legend(loc='upper right', fontsize=14)    plt.savefig('distribution of wrong predictions.png')    # plt.show()def train_test():    features, labels, n_words, sequence_length = pre_process()    print(features.shape)    print(labels.shape)    # Defining Hyperparameters    lstm_layers = 1    batch_size = 1    lstm_size = 200    learning_rate = 0.01    # --------------placeholders-------------------------------------    # Create the graph object    graph = tf.Graph()    # Add nodes to the graph    with graph.as_default():        tf.set_random_seed(1)        inputs_ = tf.placeholder(tf.int32, [None, None], name="inputs")        # labels_ = tf.placeholder(dtype= tf.int32)        labels_ = tf.placeholder(tf.float32, [None, None], name="labels")        sql_in = tf.placeholder(tf.int32, [None], name= 'sql_in')        # output_keep_prob is the dropout added to the RNN's outputs, the dropout will have no effect on the calculation of the subsequent states.        keep_prob = tf.placeholder(tf.float32, name="keep_prob")        # Size of the embedding vectors (number of units in the embedding layer)        embed_size = 300        # generating random values from a uniform distribution (minval included and maxval excluded)        embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1),trainable=True)        embed = tf.nn.embedding_lookup(embedding, inputs_)        print(embedding.shape)        print(embed.shape)        print(embed[0])        # Your basic LSTM cell        lstm =  tf.contrib.rnn.BasicLSTMCell(lstm_size)        # Add dropout to the cell        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)        # Stack up multiple LSTM layers, for deep learning        cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)        # Getting an initial state of all zeros        initial_state = cell.zero_state(batch_size, tf.float32)        outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state, sequence_length=sql_in)        # hidden layer        hidden = tf.layers.dense(outputs[:, -1], units=25, activation=tf.nn.relu)        print(hidden.shape)        logit = tf.contrib.layers.fully_connected(hidden, num_outputs=20, activation_fn=None)        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logit, labels=labels_))        optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)        saver = tf.train.Saver()    # ----------------------------online training-----------------------------------------    with tf.Session(graph=graph) as sess:        tf.set_random_seed(1)        sess.run(tf.global_variables_initializer())        iteration = 1        state = sess.run(initial_state)        wrongPred = 0        noOfWrongPreds = []        dataPoints = []        for ii, (x, y, sql) in enumerate(get_batches(features, labels, sequence_length, batch_size), 1):            feed = {inputs_: x,                    labels_: y,                    sql_in : sql,                    keep_prob: 0.5,                    initial_state: state}            predictions = tf.nn.softmax(logit).eval(feed_dict=feed)            print("----------------------------------------------------------")            print("sez: ",sql)            print("Iteration: {}".format(iteration))            isequal = np.equal(np.argmax(predictions[0], 0), np.argmax(y[0], 0))            print(np.argmax(predictions[0], 0))            print(np.argmax(y[0], 0))            if not (isequal):                wrongPred += 1            print("nummber of wrong preds: ",wrongPred)            if iteration%50 == 0:                noOfWrongPreds.append(wrongPred/iteration)                dataPoints.append(iteration)            loss, states, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)            print("Train loss: {:.3f}".format(loss))            iteration += 1        saver.save(sess, "checkpoints/sentiment.ckpt")        errorRate = wrongPred / len(labels)        print("ERRORS: ", wrongPred)        print("ERROR RATE: ", errorRate)        plot(noOfWrongPreds, dataPoints)if __name__ == '__main__':    train_test()

回答:

看起来你的模型没有学习到任何东西,只是在进行随机猜测。我提供了以下几点建议(然而,这些可能不是随机猜测的确切原因),

  1. 掩蔽成本函数:

如这里所解释的:https://danijar.com/variable-sequence-lengths-in-tensorflow/,在计算损失时,仅考虑实际的序列长度,而不使用填充的序列长度进行平均是一个好的做法。

以下解释是从上述来源中提取的:

请注意,我们的输出仍然是batch_size x max_length x out_size的大小,但对于短于最大长度的序列,最后一个将是零向量。当你在序列标记中使用每个时间步的输出时,我们不希望在成本函数中考虑它们。我们掩蔽未使用的帧,并通过实际长度计算序列长度上的平均误差。使用tf.reduce_mean()在这里不起作用,因为它会除以最大序列长度。

  1. 堆叠多个单元:

以下代码片段堆叠了相同的LSTM单元副本,而不是不同的实例,

    cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)

更多详细解释可以在这里找到:无法使用MultiRNNCell和dynamic_rnn堆叠LSTM

  1. 批量大小:

你使用了批量大小=1,这是随机梯度下降法的方法。因此,尝试增加你的批量大小(使用小批量梯度下降法),这将减少噪声并具有更快的收敛特性。

  1. 尝试几个周期,看看损失和准确度是如何变化的:

这将帮助你更好地理解你的模型的行为。

希望这些建议对你有帮助。

Related Posts

使用LSTM生成助记符 | 如何确保我的模型使用损失函数生成有意义的句子?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

自定义Keras损失函数的奇怪Nan损失

我在尝试在Keras中实现一个自定义损失函数,但无法使…

为什么我们在K-means聚类方法中使用kmeans.fit函数?

我在一个视频中使用K-means聚类技术,但我不明白为…

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名?

我想制作一个用户友好的GUI图像分类器,用户只需指向数…

如何查看每个词的tf-idf得分

我试图了解文档中每个词的tf-idf得分。然而,它只返…

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’?

我在制作一个用于情感分析的逻辑回归模型时遇到了这个问题…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注