如何使用scikit-learn计算情感分析的分类报告

如何获取三类分类的分类报告,包括精确度、召回率、准确率和支持度,这三类分别是“积极”、“消极”和“中性”。以下是代码:

vec_clf = Pipeline([('vectorizer', vec), ('pac', svm_clf)])print vec_clf.fit(X_train.values.astype('U'),y_train.values.astype('U'))y_pred = vec_clf.predict(X_test.values.astype('U'))print "SVM Accuracy-",metrics.accuracy_score(y_test, y_pred)print "confuson metrics :\n", metrics.confusion_matrix(y_test, y_pred, labels=["positive","negative","neutral"])print(metrics.classification_report(y_test, y_pred))

运行代码后出现以下错误:

SVM Accuracy- 0.850318471338confuson metrics :[[206   9  67] [  4 373 122] [  9  21 756]]Traceback (most recent call last):  File "<ipython-input-62-e6ab3066790e>", line 1, in <module>    runfile('C:/Users/HP/abc16.py', wdir='C:/Users/HP')  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile    execfile(filename, namespace)  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile    exec(compile(scripttext, filename, 'exec'), glob, loc)  File "C:/Users/HP/abc16.py", line 133, in <module>    print(metrics.classification_report(y_test, y_pred))  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 1391, in classification_report    labels = unique_labels(y_true, y_pred)  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\utils\multiclass.py", line 104, in unique_labels    raise ValueError("Mix of label input types (string and number)")ValueError: Mix of label input types (string and number)

请指导我哪里出错了

编辑1:这是y_true和y_pred的外观

        print "y_true :" ,y_test        print "y_pred :",y_pred        y_true : 5985     neutral        899     positive        2403     neutral        3963     neutral        3457     neutral        5345     neutral        3779     neutral        299      neutral        5712     neutral        5511     neutral        234      neutral        1684    negative        3701    negative        2886     neutral        .        .        .        2623    positive        3549     neutral        4574     neutral        4972    positive        Name: sentiment, Length: 1570, dtype: object        y_pred : [u'neutral' u'positive' u'neutral' ..., u'neutral' u'neutral' u'negative']

编辑2:type(y_true)和type(y_pred)的输出

type(y_true):  <class 'pandas.core.series.Series'>type(y_pred):  <type 'numpy.ndarray'>

回答:

无法重现你的错误:

import pandas as pdimport numpy as npfrom sklearn.metrics import accuracy_score, classification_report, confusion_matrix# 玩具数据,与你的类似:data = {'id':[5985,899,2403, 1684], 'sentiment':['neutral', 'positive', 'neutral', 'negative']}y_true = pd.Series(data['sentiment'], index=data['id'], name='sentiment')y_true# 5985     neutral# 899     positive# 2403     neutral# 1684    negative# Name: sentiment, dtype: objecttype(y_true)# pandas.core.series.Seriesy_pred = np.array(['neutral', 'positive', 'negative', 'neutral'])# 所有指标正常运行:accuracy_score(y_true, y_pred)# 0.5confusion_matrix(y_true, y_pred)# array([[0, 1, 0],#        [1, 1, 0],#        [0, 0, 1]], dtype=int64)classification_report(y_true, y_pred)# 结果:             precision    recall  f1-score   support   negative       0.00      0.00      0.00         1   neutral        0.50      0.50      0.50         2   positive       1.00      1.00      1.00         1      total       0.50      0.50      0.50         4

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注