如何使用从CSV文件转换的数据框绘制ROC曲线

我试图使用sklearn提供的文档来绘制ROC曲线。我的数据在一个CSV文件中，看起来像这样。它有两个类别：’Good’和’Bad’

我的CSV文件的截图

我的代码如下

import numpy as npimport matplotlib.pyplot as pltfrom itertools import cycleimport sysfrom sklearn import svm, datasetsfrom sklearn.metrics import roc_curve, aucfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import label_binarizefrom sklearn.multiclass import OneVsRestClassifierfrom scipy import interpfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.naive_bayes import MultinomialNB# Import some data to play withdf = pd.read_csv("E:\\autodesk\\TTI ROC curve.csv")X =df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].valuesy = df['TTI_Category'].as_matrix()# Binarize the outputy = label_binarize(y, classes=['Good','Bad'])n_classes = y.shape[1]# shuffle and split training and test setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,                                                    random_state=0)# Learn to predict each class against the otherclassifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,                                 random_state=random_state))y_score = classifier.fit(X_train, y_train).decision_function(X_test)# Compute ROC curve and ROC area for each classfpr = dict()tpr = dict()roc_auc = dict()for i in range(n_classes):    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])    roc_auc[i] = auc(fpr[i], tpr[i])# Compute micro-average ROC curve and ROC areafpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])plt.figure()lw = 2plt.plot(fpr[2], tpr[2], color='darkorange',         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')plt.xlim([0.0, 1.0])plt.ylim([0.0, 1.05])plt.xlabel('False Positive Rate')plt.ylabel('True Positive Rate')plt.title('Receiver operating characteristic example')plt.legend(loc="lower right")plt.show()enter code here

如果我运行这段代码，系统告诉我random_state未定义。所以我将其改为random_state=true。然后系统告诉我

plt.plot(fpr[2], tpr[2], color='darkorange', KeyError: 2 <matplotlib.figure.Figure at 0xd8bff60>

如果我打印出n_classes，系统告诉我它是”1″，如果我打印出文档中的n_classes，它显示为3。我不确定问题是否出在这里。有人知道这个错误跟踪的答案吗？

回答：

看起来你只是不明白你的数据是如何结构化的，以及你的代码应该如何工作。

LabelBinarizer会返回一个一对多的编码，这意味着对于两个类别，你会得到以下映射：['good', 'bad', 'good'] -> [[1], [0], [1]]，因此n_classes = 1。

如果你有两个类别，为什么你会期望它是3呢？只需将plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])改为plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0])，你应该就能解决问题了。

学技术

如何使用从CSV文件转换的数据框绘制ROC曲线

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复