如何在Scikit-Learn中绘制10折交叉验证的PR曲线

我正在进行一个二元预测问题的监督实验。我使用10折交叉验证来评估性能，评估指标是平均精度（每个折的平均精度除以交叉验证的折数，在我的例子中是10）。我想绘制这10折的平均精度结果的PR曲线，但我不知道最好的方法是什么。

在Cross Validated Stack Exchange网站上，之前的一个问题提出了同样的问题。一个评论建议参考这个例子，该例子展示了如何在Scikit-Learn网站上绘制交叉验证各折的ROC曲线，并将其调整为平均精度。这里是我修改过的相关代码部分，用于尝试这个想法：

from scipy import interp# 其他包/函数已导入，但对问题不关键max_ent = LogisticRegression()mean_precision = 0.0mean_recall = np.linspace(0,1,100)mean_average_precision = []for i in set(folds):    y_scores = max_ent.fit(X_train, y_train).decision_function(X_test)    precision, recall, _ = precision_recall_curve(y_test, y_scores)    average_precision = average_precision_score(y_test, y_scores)    mean_average_precision.append(average_precision)    mean_precision += interp(mean_recall, recall, precision)# 在这一行代码之后，检查mean_precision数组显示# 大多数元素等于1。这是我感到困惑的部分，# 也是导致图表不正确的因素。mean_precision /= len(set(folds))# 这是实际的MAP分数应该是什么mean_average_precision = sum(mean_average_precision) / len(mean_average_precision)# 绘制跨折的平均精度曲线的代码plt.plot(mean_recall, mean_precision)plt.title('Mean AP Over 10 folds (area=%0.2f)' % (mean_average_precision))plt.show()

代码可以运行，但在我的情况下，平均精度曲线是不正确的。不知为何，我分配用于存储mean_precision分数的数组（ROC示例中的mean_tpr变量）在除以折数后，第一个元素接近于零，其余元素都为1。下面是mean_precision分数与mean_recall分数的可视化图表。正如你所见，图表跳到1，这是不准确的。 enter image description here 所以我的猜测是在交叉验证的每折中更新mean_precision（mean_precision += interp(mean_recall, recall, precision)）时出了问题，但如何修复这个问题还不清楚。任何指导或帮助将不胜感激。

回答：

我遇到了同样的问题。这是我的解决方案：不是在各折之间取平均，我在循环之后计算所有折的结果的precision_recall_curve。根据https://stats.stackexchange.com/questions/34611/meanscores-vs-scoreconcatenation-in-cross-validation中的讨论，这通常是更可取的方法。

import matplotlib.pyplot as pltimport numpyfrom sklearn.datasets import make_blobsfrom sklearn.metrics import precision_recall_curve, aucfrom sklearn.model_selection import KFoldfrom sklearn.svm import SVCFOLDS = 5X, y = make_blobs(n_samples=1000, n_features=2, centers=2, cluster_std=10.0,    random_state=12345)f, axes = plt.subplots(1, 2, figsize=(10, 5))axes[0].scatter(X[y==0,0], X[y==0,1], color='blue', s=2, label='y=0')axes[0].scatter(X[y!=0,0], X[y!=0,1], color='red', s=2, label='y=1')axes[0].set_xlabel('X[:,0]')axes[0].set_ylabel('X[:,1]')axes[0].legend(loc='lower left', fontsize='small')k_fold = KFold(n_splits=FOLDS, shuffle=True, random_state=12345)predictor = SVC(kernel='linear', C=1.0, probability=True, random_state=12345)y_real = []y_proba = []for i, (train_index, test_index) in enumerate(k_fold.split(X)):    Xtrain, Xtest = X[train_index], X[test_index]    ytrain, ytest = y[train_index], y[test_index]    predictor.fit(Xtrain, ytrain)    pred_proba = predictor.predict_proba(Xtest)    precision, recall, _ = precision_recall_curve(ytest, pred_proba[:,1])    lab = 'Fold %d AUC=%.4f' % (i+1, auc(recall, precision))    axes[1].step(recall, precision, label=lab)    y_real.append(ytest)    y_proba.append(pred_proba[:,1])y_real = numpy.concatenate(y_real)y_proba = numpy.concatenate(y_proba)precision, recall, _ = precision_recall_curve(y_real, y_proba)lab = 'Overall AUC=%.4f' % (auc(recall, precision))axes[1].step(recall, precision, label=lab, lw=2, color='black')axes[1].set_xlabel('Recall')axes[1].set_ylabel('Precision')axes[1].legend(loc='lower left', fontsize='small')f.tight_layout()f.savefig('result.png')

学技术

如何在Scikit-Learn中绘制10折交叉验证的PR曲线

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复