使用LSTM在在线多类分类中每次迭代预测相同值

我开发了一个代码,用于使用20个新闻组数据集进行在线多类分类。为了消除输入到LSTM中的文本填充0的影响,我在dynamic_rnn中添加了‘sequence_length’参数,传递每个正在处理的文本的长度。

添加此属性后,预测(如下所示的代码)在所有迭代中都给出了相同的预测除了第一次

predictions = tf.nn.softmax(logit).eval(feed_dict=feed)

下面显示的是我在第一次、第二次、第三次和第四次迭代中接收到的预测:

第一次: [[0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05]]

第二次: [[0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.0509586 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956 0.04994956]]

第三次: [[0.0498649 0.0498649 0.0498649 0.05072384 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.05170782 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 0.0498649 ]]

第四次: [[0.04974937 0.04974937 0.04974937 0.05137746 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.05234195 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.04974937 0.05054148 0.04974937 0.04974937]]

在第二次迭代之后,预测不再变化(预测的argmax始终为10)。

问题:我在这里做错了什么?提前谢谢!

下面是我的完整代码:

from collections import Counterimport tensorflow as tffrom sklearn.datasets import fetch_20newsgroupsimport matplotlib as mpltmplt.use('agg') # Must be before importing matplotlib.pyplot or pylab!import matplotlib.pyplot as pltfrom string import punctuationfrom sklearn.preprocessing import LabelBinarizerimport numpy as npfrom nltk.corpus import stopwordsimport nltknltk.download('stopwords')def pre_process():    newsgroups_data = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))    words = []    temp_post_text = []    print(len(newsgroups_data.data))    for post in newsgroups_data.data:        all_text = ''.join([text for text in post if text not in punctuation])        all_text = all_text.split('\n')        all_text = ''.join(all_text)        temp_text = all_text.split(" ")        for word in temp_text:            if word.isalpha():                temp_text[temp_text.index(word)] = word.lower()        # temp_text = [word for word in temp_text if word not in stopwords.words('english')]        temp_text = list(filter(None, temp_text))        temp_text = ' '.join([i for i in temp_text if not i.isdigit()])        words += temp_text.split(" ")        temp_post_text.append(temp_text)    # temp_post_text = list(filter(None, temp_post_text))    dictionary = Counter(words)    # deleting spaces    # del dictionary[""]    sorted_split_words = sorted(dictionary, key=dictionary.get, reverse=True)    vocab_to_int = {c: i for i, c in enumerate(sorted_split_words,1)}    message_ints = []    for message in temp_post_text:        temp_message = message.split(" ")        message_ints.append([vocab_to_int[i] for i in temp_message])    # maximum message length = 6577    # message_lens = Counter([len(x) for x in message_ints])AAA    seq_length = 6577    num_messages = len(temp_post_text)    features = np.zeros([num_messages, seq_length], dtype=int)    for i, row in enumerate(message_ints):        # print(features[i, -len(row):])        # features[i, -len(row):] = np.array(row)[:seq_length]        features[i, :len(row)] = np.array(row)[:seq_length]        # print(features[i])    lb = LabelBinarizer()    lbl = newsgroups_data.target    labels = np.reshape(lbl, [-1])    labels = lb.fit_transform(labels)    sequence_lengths = [len(msg) for msg in message_ints]    return features, labels, len(sorted_split_words)+1, sequence_lengthsdef get_batches(x, y, sql, batch_size=1):    for ii in range(0, len(y), batch_size):        yield x[ii:ii + batch_size], y[ii:ii + batch_size], sql[ii:ii+batch_size]def plot(noOfWrongPred, dataPoints):    font_size = 14    fig = plt.figure(dpi=100,figsize=(10, 6))    mplt.rcParams.update({'font.size': font_size})    plt.title("Distribution of wrong predictions", fontsize=font_size)    plt.ylabel('Error rate', fontsize=font_size)    plt.xlabel('Number of data points', fontsize=font_size)    plt.plot(dataPoints, noOfWrongPred, label='Prediction', color='blue', linewidth=1.8)    # plt.legend(loc='upper right', fontsize=14)    plt.savefig('distribution of wrong predictions.png')    # plt.show()def train_test():    features, labels, n_words, sequence_length = pre_process()    print(features.shape)    print(labels.shape)    # Defining Hyperparameters    lstm_layers = 1    batch_size = 1    lstm_size = 200    learning_rate = 0.01    # --------------placeholders-------------------------------------    # Create the graph object    graph = tf.Graph()    # Add nodes to the graph    with graph.as_default():        tf.set_random_seed(1)        inputs_ = tf.placeholder(tf.int32, [None, None], name="inputs")        # labels_ = tf.placeholder(dtype= tf.int32)        labels_ = tf.placeholder(tf.float32, [None, None], name="labels")        sql_in = tf.placeholder(tf.int32, [None], name= 'sql_in')        # output_keep_prob is the dropout added to the RNN's outputs, the dropout will have no effect on the calculation of the subsequent states.        keep_prob = tf.placeholder(tf.float32, name="keep_prob")        # Size of the embedding vectors (number of units in the embedding layer)        embed_size = 300        # generating random values from a uniform distribution (minval included and maxval excluded)        embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1),trainable=True)        embed = tf.nn.embedding_lookup(embedding, inputs_)        print(embedding.shape)        print(embed.shape)        print(embed[0])        # Your basic LSTM cell        lstm =  tf.contrib.rnn.BasicLSTMCell(lstm_size)        # Add dropout to the cell        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)        # Stack up multiple LSTM layers, for deep learning        cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)        # Getting an initial state of all zeros        initial_state = cell.zero_state(batch_size, tf.float32)        outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state, sequence_length=sql_in)        # hidden layer        hidden = tf.layers.dense(outputs[:, -1], units=25, activation=tf.nn.relu)        print(hidden.shape)        logit = tf.contrib.layers.fully_connected(hidden, num_outputs=20, activation_fn=None)        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logit, labels=labels_))        optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)        saver = tf.train.Saver()    # ----------------------------online training-----------------------------------------    with tf.Session(graph=graph) as sess:        tf.set_random_seed(1)        sess.run(tf.global_variables_initializer())        iteration = 1        state = sess.run(initial_state)        wrongPred = 0        noOfWrongPreds = []        dataPoints = []        for ii, (x, y, sql) in enumerate(get_batches(features, labels, sequence_length, batch_size), 1):            feed = {inputs_: x,                    labels_: y,                    sql_in : sql,                    keep_prob: 0.5,                    initial_state: state}            predictions = tf.nn.softmax(logit).eval(feed_dict=feed)            print("----------------------------------------------------------")            print("sez: ",sql)            print("Iteration: {}".format(iteration))            isequal = np.equal(np.argmax(predictions[0], 0), np.argmax(y[0], 0))            print(np.argmax(predictions[0], 0))            print(np.argmax(y[0], 0))            if not (isequal):                wrongPred += 1            print("nummber of wrong preds: ",wrongPred)            if iteration%50 == 0:                noOfWrongPreds.append(wrongPred/iteration)                dataPoints.append(iteration)            loss, states, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)            print("Train loss: {:.3f}".format(loss))            iteration += 1        saver.save(sess, "checkpoints/sentiment.ckpt")        errorRate = wrongPred / len(labels)        print("ERRORS: ", wrongPred)        print("ERROR RATE: ", errorRate)        plot(noOfWrongPreds, dataPoints)if __name__ == '__main__':    train_test()

回答:

看起来你的模型没有学习到任何东西,只是在进行随机猜测。我提供了以下几点建议(然而,这些可能不是随机猜测的确切原因),

  1. 掩蔽成本函数:

如这里所解释的:https://danijar.com/variable-sequence-lengths-in-tensorflow/,在计算损失时,仅考虑实际的序列长度,而不使用填充的序列长度进行平均是一个好的做法。

以下解释是从上述来源中提取的:

请注意,我们的输出仍然是batch_size x max_length x out_size的大小,但对于短于最大长度的序列,最后一个将是零向量。当你在序列标记中使用每个时间步的输出时,我们不希望在成本函数中考虑它们。我们掩蔽未使用的帧,并通过实际长度计算序列长度上的平均误差。使用tf.reduce_mean()在这里不起作用,因为它会除以最大序列长度。

  1. 堆叠多个单元:

以下代码片段堆叠了相同的LSTM单元副本,而不是不同的实例,

    cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)

更多详细解释可以在这里找到:无法使用MultiRNNCell和dynamic_rnn堆叠LSTM

  1. 批量大小:

你使用了批量大小=1,这是随机梯度下降法的方法。因此,尝试增加你的批量大小(使用小批量梯度下降法),这将减少噪声并具有更快的收敛特性。

  1. 尝试几个周期,看看损失和准确度是如何变化的:

这将帮助你更好地理解你的模型的行为。

希望这些建议对你有帮助。

Related Posts

在使用k近邻算法时,有没有办法获取被使用的“邻居”?

我想找到一种方法来确定在我的knn算法中实际使用了哪些…

Theano在Google Colab上无法启用GPU支持

我在尝试使用Theano库训练一个模型。由于我的电脑内…

准确性评分似乎有误

这里是代码: from sklearn.metrics…

Keras Functional API: “错误检查输入时:期望input_1具有4个维度,但得到形状为(X, Y)的数组”

我在尝试使用Keras的fit_generator来训…

如何使用sklearn.datasets.make_classification在指定范围内生成合成数据?

我想为分类问题创建合成数据。我使用了sklearn.d…

如何处理预测时不在训练集中的标签

已关闭。 此问题与编程或软件开发无关。目前不接受回答。…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注