我想找到在应用多项式朴素贝叶斯分类算法后成功分类和未被分类的原始数据。例如,在应用多项式朴素贝叶斯分类后,我得到了88%的准确率。我想知道那12%未被分类的数据,以及那88%被正确分类的数据。提前感谢。
我的数据集:
+----------------------+------------+
| Details | Category |
+----------------------+------------+
| Any raw text1 | cat1 |
+----------------------+------------+
| any raw text2 | cat1 |
+----------------------+------------+
| any raw text5 | cat2 |
+----------------------+------------+
| any raw text7 | cat1 |
+----------------------+------------+
| any raw text8 | cat2 |
+----------------------+------------+
| Any raw text4 | cat4 |
+----------------------+------------+
| any raw text5 | cat4 |
+----------------------+------------+
| any raw text6 | cat3 |
+----------------------+------------+
我的代码:
import pandas as pd
import numpy as np
import scipy as sp
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
data= pd.read_csv('mydat.xls', delimiter='\t',usecols=['Details','Category'],encoding='utf-8')
target_one=data['Category']
target_list=data['Category'].unique()
x_train, x_test, y_train, y_test = train_test_split(data.Details, data.Category, random_state=42)
vect = CountVectorizer(ngram_range=(1,2))
#converting traning features into numeric vector
X_train = vect.fit_transform(x_train.values.astype('U'))
#converting training labels into numeric vector
X_test = vect.transform(x_test.values.astype('U'))
# start = time.clock()
mnb = MultinomialNB(alpha =0.13)
mnb.fit(X_train,y_train)
result= mnb.predict(X_test)
# mnb.predict_proba(x_test)[0:10,1]
accuracy_score(result,y_test)
回答:
只需遍历你的数据:
for i in range(len(y_test)):
if result[i] == y_test[i]:
print("CORRECT: ", X_test[i])
else:
print("INCORRECT: ", X_test[i])
你可以将它们添加到两个不同的列表中,或者只打印ID,或者根据需要进行其他操作。