我在尝试从 roc_curve()
中获取 tpr(真正率)
和 fpr(假正率)
,然后是 auc 分数(),然后可以绘制图表来查看我的模型在多标签(500 个标签)不平衡数据上的表现,但出现了错误。
我正在计算每个标签预测的概率,以便我可以调整阈值以获得更好的精确度、召回率和准确率,并在预测时获得最多的目标标签。
代码:
from sklearn.ensemble import RandomForestClassifierfrom sklearn.multioutput import ClassifierChainrfc = RandomForestClassifier(n_jobs = -1, random_state =0, class_weight = 'balanced')clf2 = ClassifierChain(rfc)clf2.fit(X_train , y_train)y_pred = clf2.predict_proba(X_test)y_pred.shape>> (8125,500)y_pred[0]>> array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.01, 0. , 0.01, 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0.01, 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.03, 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.5 , 0.01, 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.05, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.02, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.03, 0.04, 0. , 0. , 0. , 0.01, 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.02, 0. , 0. , 0.01, 0. , 0.01, 0. , 0.28, 0. , 0. , 0. , 0. , 0.01, 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0.02, 0.07, 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.02, 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.02, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.02, 0.01, 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.01, 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0.03, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.15, 0. , 0. , 0.02, 0. , 0.01, 0. , 0.11, 0. , 0.01, 0. , 0. , 0. , 0. , 0.02, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.02, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0.1 , 0.02, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.02, 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0. , 0.01, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])from sklearn.metrics import roc_auc_score,roc_curve,precision_recall_curvefpr, tpr, thresholds = roc_curve(y_test,y_pred)
代码的最后一行出现了错误。
跟踪信息:
ValueError Traceback (most recent call last)<ipython-input-72-ea45ece64953> in <module>() 1 from sklearn.metrics import roc_auc_score,roc_curve,precision_recall_curve----> 2 fpr, tpr, thresholds = roc_curve(y_test,y_pred)1 frames/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_ranking.py in _binary_clf_curve(y_true, y_score, pos_label, sample_weight) 534 if not (y_type == "binary" or 535 (y_type == "multiclass" and pos_label is not None)):--> 536 raise ValueError("{0} format is not supported".format(y_type)) 537 538 check_consistent_length(y_true, y_score, sample_weight)ValueError: multilabel-indicator format is not supported
回答: