如何在训练好的模型上对新句子进行情感分析？

我使用朴素贝叶斯训练了一个模型，准确率很高，但现在我想输入一个句子，然后查看它的情感。这是我的代码：

# 数据分析import pandas as pd# 数据预处理和特征工程from textblob import TextBlobimport refrom nltk.corpus import stopwordsfrom sklearn.feature_extraction.text import TfidfVectorizer# 模型选择和验证from sklearn.naive_bayes import MultinomialNBfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import confusion_matrix, classification_report, accuracy_scoreimport joblibimport warningsimport mlflowwarnings.filterwarnings("ignore")train_tweets = pd.read_csv('data/train.csv')tweets = train_tweets.tweet.valueslabels = train_tweets.label.valuesprocessed_features = []for sentence in range(0, len(tweets)):    # 移除所有特殊字符    processed_feature = re.sub(r'\W', ' ', str(tweets[sentence]))    # 移除所有单个字符    processed_feature= re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_feature)    # 从开头移除单个字符    processed_feature = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_feature)    # 将多个空格替换为单个空格    processed_feature = re.sub(r'\s+', ' ', processed_feature, flags=re.I)    # 移除前缀'b'    processed_feature = re.sub(r'^b\s+', '', processed_feature)    # 转换为小写    processed_feature = processed_feature.lower()    processed_features.append(processed_feature)vectorizer = TfidfVectorizer(max_features=2500, min_df=7, max_df=0.8, stop_words=stopwords.words('english'))processed_features = vectorizer.fit_transform(processed_features).toarray()X_train, X_test, y_train, y_test = train_test_split(processed_features, labels, test_size=0.2, random_state=0)text_classifier = MultinomialNB()text_classifier.fit(X_train, y_train)predictions = text_classifier.predict(X_test)print(confusion_matrix(y_test,predictions))print(classification_report(y_test,predictions))print(accuracy_score(y_test, predictions))joblib.dump(text_classifier, 'model.pkl')

如你所见，我保存了我的模型。现在，我想输入这样的句子：

new_sentence = "我今天非常开心"model.predict(new_sentence)

然后我想看到这样的输出：

sentence = "我今天非常开心"sentiment = 积极

我该怎么做呢？

回答：

首先，将预处理步骤放在一个函数中：

def preproc(tweets):    processed_features = []    for sentence in range(0, len(tweets)):        # 移除所有特殊字符        processed_feature = re.sub(r'\W', ' ', str(tweets[sentence]))        # 移除所有单个字符        processed_feature= re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_feature)        # 从开头移除单个字符        processed_feature = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_feature)        # 将多个空格替换为单个空格        processed_feature = re.sub(r'\s+', ' ', processed_feature, flags=re.I)        # 移除前缀'b'        processed_feature = re.sub(r'^b\s+', '', processed_feature)        # 转换为小写        processed_feature = processed_feature.lower()        processed_features.append(processed_feature)    return processed_featuresprocessed_features = preproc(tweets)vectorizer = TfidfVectorizer(max_features=2500, min_df=7, max_df=0.8, stop_words=stopwords.words('english'))processed_features = vectorizer.fit_transform(processed_features).toarray()

然后使用它来预处理测试字符串，并使用transform将其输入到分类器中：

# 输入两个单句推文：test = preproc([["我讨厌这本书。"], ["我爱这部电影。"]])predictions = text_classifier.predict(vectorizer.transform(test).toarray())print(predictions)

现在，根据数据集中你拥有的标签以及train_tweets.label.values的编码方式，你将得到不同的输出，你可以将其解析为字符串。例如，如果数据集中的标签被编码为1=积极，0=消极，你可能会得到[0,1]。

学技术

如何在训练好的模型上对新句子进行情感分析？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复