决策树分类器的精确召回曲线下面积为一个正方形

我正在使用scikit-learn中的DecisionTreeClassifier来对数据进行分类。我还使用了其他算法,并使用精确召回曲线下面积(AUPRC)来比较它们。问题是DecisionTreeClassifier的AUPRC形状是一个正方形,而不是这个指标通常应有的形状。

这是我如何计算DecisionTreeClassifier的AUPRC。由于DecisionTreeClassifier不像LogisticRegression等其他分类器那样具有decision_function(),我在这方面遇到了一些麻烦。

这是我对SVM、逻辑回归和DecisionTreeClassifier的AUPRC计算结果。

这是我如何计算DecisionTreeClassifier的AUPRC

def execute(X_train, y_train, X_test, y_test):    tree = DecisionTreeClassifier(class_weight='balanced')    tree_y_score = tree.fit(X_train, y_train).predict(X_test)    tree_ap_score = average_precision_score(y_test, tree_y_score)    precision, recall, _ = precision_recall_curve(y_test, tree_y_score)    values = {'ap_score': tree_ap_score, 'precision': precision, 'recall': recall}    return values

这是我如何计算SVM的AUPRC:

def execute(X_train, y_train, X_test, y_test):    svm = SVC(class_weight='balanced')    svm.fit(X_train, y_train.values.ravel())    svm_y_score = svm.decision_function(X_test)    svm_ap_score = average_precision_score(y_test, svm_y_score)    precision, recall, _ = precision_recall_curve(y_test, svm_y_score)    values = {'ap_score': svm_ap_score, 'precision': precision, 'recall': recall}    return values

这是我如何计算逻辑回归的AUPRC:

def execute(X_train, y_train, X_test, y_test):    lr = LogisticRegression(class_weight='balanced')    lr.fit(X_train, y_train.values.ravel())    lr_y_score = lr.decision_function(X_test)    lr_ap_score = average_precision_score(y_test, lr_y_score)    precision, recall, _ = precision_recall_curve(y_test, lr_y_score)    values = {'ap_score': lr_ap_score, 'precision': precision, 'recall': recall}    return values

然后我调用这些方法并像这样绘制结果:

import LogReg_AP_Harness as lrApTestimport SVM_AP_Harness as svmApTestimport DecTree_AP_Harness as dtApTestfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import label_binarizeimport matplotlib.pyplot as pltdef do_work(df):    X = df.ix[:, df.columns != 'Class']    y = df.ix[:, df.columns == 'Class']    y_binarized = label_binarize(y, classes=[0, 1])    n_classes = y_binarized.shape[1]    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=0)    _, _, y_train_binarized, y_test_binarized = train_test_split(X, y_binarized, test_size=.3, random_state=0)    print('Executing Logistic Regression')    lr_values = lrApTest.execute(X_train, y_train, X_test, y_test)    print('Executing Decision Tree')    dt_values = dtApTest.execute(X_train, y_train_binarized, X_test, y_test_binarized)    print('Executing SVM')    svm_values = svmApTest.execute(X_train, y_train, X_test, y_test)    plot_aupr_curves(lr_values, svm_values, dt_values)def plot_aupr_curves(lr_values, svm_values, dt_values):    lr_ap_score = lr_values['ap_score']    lr_precision = lr_values['precision']    lr_recall = lr_values['recall']    svm_ap_score = svm_values['ap_score']    svm_precision = svm_values['precision']    svm_recall = svm_values['recall']    dt_ap_score = dt_values['ap_score']    dt_precision = dt_values['precision']    dt_recall = dt_values['recall']    plt.step(svm_recall, svm_precision, color='g', alpha=0.2,where='post')    plt.fill_between(svm_recall, svm_precision, step='post', alpha=0.2, color='g')    plt.step(lr_recall, lr_precision, color='b', alpha=0.2, where='post')    plt.fill_between(lr_recall, lr_precision, step='post', alpha=0.2, color='b')    plt.step(dt_recall, dt_precision, color='r', alpha=0.2, where='post')    plt.fill_between(dt_recall, dt_precision, step='post', alpha=0.2, color='r')    plt.xlabel('Recall')    plt.ylabel('Precision')    plt.ylim([0.0, 1.05])    plt.xlim([0.0, 1.0])    plt.title('SVM (Green): Precision-Recall curve: AP={0:0.2f}'.format(svm_ap_score) + '\n' +              'Logistic Regression (Blue): Precision-Recall curve: AP={0:0.2f}'.format(lr_ap_score) + '\n' +              'Decision Tree (Red): Precision-Recall curve: AP={0:0.2f}'.format(dt_ap_score))    plt.show()

do_work()方法中,我不得不对y进行二值化,因为DecisionTreeClassifier没有descision_function()。我的方法来自这里

这是绘图结果:

AUPRC Plot

我想归根结底是我错误地计算了DecisionTreeClassifier的AUPRC。


回答:

对于DecisionTreeClassifier,请将predict替换为predict_proba;后者起到与decision_function相同的作用。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注