如何获取三类分类的分类报告,包括精确度、召回率、准确率和支持度,这三类分别是“积极”、“消极”和“中性”。以下是代码:
vec_clf = Pipeline([('vectorizer', vec), ('pac', svm_clf)])print vec_clf.fit(X_train.values.astype('U'),y_train.values.astype('U'))y_pred = vec_clf.predict(X_test.values.astype('U'))print "SVM Accuracy-",metrics.accuracy_score(y_test, y_pred)print "confuson metrics :\n", metrics.confusion_matrix(y_test, y_pred, labels=["positive","negative","neutral"])print(metrics.classification_report(y_test, y_pred))
运行代码后出现以下错误:
SVM Accuracy- 0.850318471338confuson metrics :[[206 9 67] [ 4 373 122] [ 9 21 756]]Traceback (most recent call last): File "<ipython-input-62-e6ab3066790e>", line 1, in <module> runfile('C:/Users/HP/abc16.py', wdir='C:/Users/HP') File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace) File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc) File "C:/Users/HP/abc16.py", line 133, in <module> print(metrics.classification_report(y_test, y_pred)) File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 1391, in classification_report labels = unique_labels(y_true, y_pred) File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\utils\multiclass.py", line 104, in unique_labels raise ValueError("Mix of label input types (string and number)")ValueError: Mix of label input types (string and number)
请指导我哪里出错了
编辑1:这是y_true和y_pred的外观
print "y_true :" ,y_test print "y_pred :",y_pred y_true : 5985 neutral 899 positive 2403 neutral 3963 neutral 3457 neutral 5345 neutral 3779 neutral 299 neutral 5712 neutral 5511 neutral 234 neutral 1684 negative 3701 negative 2886 neutral . . . 2623 positive 3549 neutral 4574 neutral 4972 positive Name: sentiment, Length: 1570, dtype: object y_pred : [u'neutral' u'positive' u'neutral' ..., u'neutral' u'neutral' u'negative']
编辑2:type(y_true)和type(y_pred)的输出
type(y_true): <class 'pandas.core.series.Series'>type(y_pred): <type 'numpy.ndarray'>
回答:
无法重现你的错误:
import pandas as pdimport numpy as npfrom sklearn.metrics import accuracy_score, classification_report, confusion_matrix# 玩具数据,与你的类似:data = {'id':[5985,899,2403, 1684], 'sentiment':['neutral', 'positive', 'neutral', 'negative']}y_true = pd.Series(data['sentiment'], index=data['id'], name='sentiment')y_true# 5985 neutral# 899 positive# 2403 neutral# 1684 negative# Name: sentiment, dtype: objecttype(y_true)# pandas.core.series.Seriesy_pred = np.array(['neutral', 'positive', 'negative', 'neutral'])# 所有指标正常运行:accuracy_score(y_true, y_pred)# 0.5confusion_matrix(y_true, y_pred)# array([[0, 1, 0],# [1, 1, 0],# [0, 0, 1]], dtype=int64)classification_report(y_true, y_pred)# 结果: precision recall f1-score support negative 0.00 0.00 0.00 1 neutral 0.50 0.50 0.50 2 positive 1.00 1.00 1.00 1 total 0.50 0.50 0.50 4