查找正确和错误分类的数据

我想找到在应用多项式朴素贝叶斯分类算法后成功分类和未被分类的原始数据。例如，在应用多项式朴素贝叶斯分类后，我得到了88%的准确率。我想知道那12%未被分类的数据，以及那88%被正确分类的数据。提前感谢。

我的数据集：

+----------------------+------------+
| Details              | Category   |
+----------------------+------------+
| Any raw text1        | cat1       |
+----------------------+------------+
| any raw text2        | cat1       |
+----------------------+------------+
| any raw text5        | cat2       |
+----------------------+------------+
| any raw text7        | cat1       |
+----------------------+------------+
| any raw text8        | cat2       |
+----------------------+------------+
| Any raw text4        | cat4       |
+----------------------+------------+
| any raw text5        | cat4       |
+----------------------+------------+
| any raw text6        | cat3       |
+----------------------+------------+

我的代码：

import pandas as pd
import numpy as np
import scipy as sp
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt  
from sklearn.model_selection import train_test_split
data= pd.read_csv('mydat.xls', delimiter='\t',usecols=['Details','Category'],encoding='utf-8')
target_one=data['Category']
target_list=data['Category'].unique()         
x_train, x_test, y_train, y_test = train_test_split(data.Details, data.Category, random_state=42)
vect = CountVectorizer(ngram_range=(1,2))
#converting traning features into numeric vector
X_train = vect.fit_transform(x_train.values.astype('U'))
#converting training labels into numeric vector
X_test = vect.transform(x_test.values.astype('U'))
# start = time.clock()
mnb = MultinomialNB(alpha =0.13)
mnb.fit(X_train,y_train)
result= mnb.predict(X_test)
# mnb.predict_proba(x_test)[0:10,1]
accuracy_score(result,y_test)

回答：

只需遍历你的数据：

for i in range(len(y_test)):
    if result[i] == y_test[i]:
        print("CORRECT: ", X_test[i])
    else:
        print("INCORRECT: ", X_test[i])

你可以将它们添加到两个不同的列表中，或者只打印ID，或者根据需要进行其他操作。

学技术

查找正确和错误分类的数据

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复