RNN: 在模型训练后从文本输入获取预测

我刚开始接触RNN,并且一直在做一个小型的二元标签分类器。我已经能够得到一个稳定且结果令人满意的模型。

然而,我在使用模型对新输入进行分类时遇到了困难,我想请问你们中有没有人能帮我。请参考下面的代码。

非常感谢。

from tensorflow.keras import preprocessingfrom sklearn.utils import shuffleimport pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import LabelEncoderfrom tensorflow.keras.models import Modelfrom tensorflow.keras import modelsfrom tensorflow.keras.layers import LSTM, Activation, Dense, Dropout, Input, Embeddingfrom tensorflow.keras.optimizers import RMSprop, Adamfrom tensorflow.keras.preprocessing import sequence, textfrom tensorflow.keras.callbacks import EarlyStoppingfrom matplotlib import pyplotclass tensor_rnn():def __init__(self, hidden_layers=3):    self.data_path = 'C:\\\\Users\\cmazz\\PycharmProjects\\InvestmentAnalysis_2.0\\Sentiment\\Finance_Articles\\'    # self.corp_paths = corpora_paths    self.h_layers = hidden_layers    self.num_words = []    good = pd.read_csv(self.data_path + 'GoodO.csv')    good['Polarity'] = 'pos'    for line in good['Head'].tolist():        counter = len(line.split())        self.num_words.append(counter)    bad = pd.read_csv(self.data_path + 'BadO.csv')    bad['Polarity'] = 'neg'    for line in bad['Head'].tolist():        counter = len(line.split())        self.num_words.append(counter)    self.features = pd.concat([good, bad]).reset_index(drop=True)    self.features = shuffle(self.features)    self.max_len = len(max(self.features['Head'].tolist()))    # self.train, self.test = train_test_split(features, test_size=0.33, random_state=42)    X = self.features['Head']    Y = self.features['Polarity']    le = LabelEncoder()    Y = le.fit_transform(Y)    Y = Y.reshape(-1, 1)    self.X_train, self.X_test, self.Y_train, self.Y_test = train_test_split(X, Y, test_size=0.30)    self.tok = preprocessing.text.Tokenizer(num_words=len(self.num_words))    self.tok.fit_on_texts(self.X_train)    sequences = self.tok.texts_to_sequences(self.X_train)    self.sequences_matrix = preprocessing.sequence.pad_sequences(sequences, maxlen=self.max_len)def RNN(self):    inputs = Input(name='inputs', shape=[self.max_len])    layer = Embedding(len(self.num_words), 30, input_length=self.max_len)(inputs)    # layer = LSTM(64, return_sequences=True)(layer)    layer = LSTM(32)(layer)    layer = Dense(256, name='FC1')(layer)    layer = Activation('relu')(layer)    layer = Dropout(0.5)(layer)    layer = Dense(1, name='out_layer')(layer)    layer = Activation('sigmoid')(layer)    model = Model(inputs=inputs, outputs=layer)    return modeldef model_train(self):    self.model = self.RNN()    self.model.summary()    self.model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])   # RMSprop()def model_test(self):    self.history = self.model.fit(self.sequences_matrix, self.Y_train, batch_size=100, epochs=3,              validation_split=0.30, callbacks=[EarlyStopping(monitor='val_loss', min_delta=0.0001)])    test_sequences = self.tok.texts_to_sequences(self.X_test)    test_sequences_matrix = sequence.pad_sequences(test_sequences, maxlen=self.max_len)    accr = self.model.evaluate(test_sequences_matrix, self.Y_test)    print('Test set\n  Loss: {:0.3f}\n  Accuracy: {:0.3f}'.format(accr[0], accr[1]))if __name__ == "__main__":    a = tensor_rnn()    a.model_train()    a.model_test()    a.model.save('C:\\\\Users\\cmazz\\PycharmProjects\\'                              'InvestmentAnalysis_2.0\\RNN_Model.h5',     include_optimizer=True)     b = models.load_model('C:\\\\Users\\cmazz\\PycharmProjects\\'                              'InvestmentAnalysis_2.0\\RNN_Model.h5')    stringy = ['Fund managers back away from Amazon as they cut FANG exposure']    prediction = b.predict(np.array(stringy))    print(prediction)

当我运行代码时,得到以下错误:

ValueError: 检查输入时出错:期望输入形状为(39,),但得到的数组形状为(1,)


回答:

根据ValueError和prediction = b.predict(np.array(stringy)),我认为你需要对输入字符串进行标记化处理。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注