如何解释这种三角形的ROC AUC曲线？

我有10多个特征和数万个案例来训练一个逻辑回归模型，用于分类人的种族。第一个例子是法国人与非法国人，第二个例子是英国人与非英国人。结果如下：

//////////////////////////////////////////////////////1= fr0= non-frClass count:0    691091    30891dtype: int64Accuracy: 0.95126Classification report:             precision    recall  f1-score   support          0       0.97      0.96      0.96     34547          1       0.92      0.93      0.92     15453avg / total       0.95      0.95      0.95     50000Confusion matrix:[[33229  1318] [ 1119 14334]]AUC= 0.944717975754//////////////////////////////////////////////////////1= en0= non-enClass count:0    761251    23875dtype: int64Accuracy: 0.7675Classification report:             precision    recall  f1-score   support          0       0.91      0.78      0.84     38245          1       0.50      0.74      0.60     11755avg / total       0.81      0.77      0.78     50000Confusion matrix:[[29677  8568] [ 3057  8698]]AUC= 0.757955582999//////////////////////////////////////////////////////

然而，我得到了一些非常奇怪的AUC曲线，它们呈现出三角形而不是锯齿状的圆形曲线。为什么会出现这种形状？有什么可能的错误吗？

代码：

    all_dict = []    for i in range(0, len(my_dict)):        temp_dict = dict(my_dict[i].items() + my_dict2[i].items() + my_dict3[i].items() + my_dict4[i].items()            + my_dict5[i].items() + my_dict6[i].items() + my_dict7[i].items() + my_dict8[i].items()            + my_dict9[i].items() + my_dict10[i].items() + my_dict11[i].items() + my_dict12[i].items()            + my_dict13[i].items() + my_dict14[i].items() + my_dict15[i].items() + my_dict16[i].items()            )        all_dict.append(temp_dict)    newX = dv.fit_transform(all_dict)    # Separate the training and testing data sets    half_cut = int(len(df)/2.0)*-1    X_train = newX[:half_cut]    X_test = newX[half_cut:]    y_train = y[:half_cut]    y_test = y[half_cut:]    # Fitting X and y into model, using training data    #$$    lr.fit(X_train, y_train)    # Making predictions using trained data    #$$    y_train_predictions = lr.predict(X_train)    #$$    y_test_predictions = lr.predict(X_test)    #print (y_train_predictions == y_train).sum().astype(float)/(y_train.shape[0])    print 'Accuracy:',(y_test_predictions == y_test).sum().astype(float)/(y_test.shape[0])    print 'Classification report:'    print classification_report(y_test, y_test_predictions)    #print sk_confusion_matrix(y_train, y_train_predictions)    print 'Confusion matrix:'    print sk_confusion_matrix(y_test, y_test_predictions)    #print y_test[1:20]    #print y_test_predictions[1:20]    #print y_test[1:10]    #print np.bincount(y_test)    #print np.bincount(y_test_predictions)    # Find and plot AUC    false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_test_predictions)    roc_auc = auc(false_positive_rate, true_positive_rate)    print 'AUC=',roc_auc    plt.title('Receiver Operating Characteristic')    plt.plot(false_positive_rate, true_positive_rate, 'b', label='AUC = %0.2f'% roc_auc)    plt.legend(loc='lower right')    plt.plot([0,1],[0,1],'r--')    plt.xlim([-0.1,1.2])    plt.ylim([-0.1,1.2])    plt.ylabel('True Positive Rate')    plt.xlabel('False Positive Rate')    plt.show()

回答：

你做错了。根据文档说明：

y_score : array, shape = [n_samples]    Target scores, can either be probability estimates of the positive class or confidence values.

因此在这一行：

roc_curve(y_test, y_test_predictions)

你应该传递decision_function的结果（或者predict_proba结果中的两列之一）给roc_curve函数，而不是实际的预测结果。

请查看这些示例 http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#example-model-selection-plot-roc-py

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#example-model-selection-plot-roc-crossval-py

学技术

如何解释这种三角形的ROC AUC曲线？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复