我试图使用sklearn提供的文档来绘制ROC曲线。我的数据在一个CSV文件中,看起来像这样。它有两个类别:’Good’和’Bad’
我的CSV文件的截图
我的代码如下
import numpy as npimport matplotlib.pyplot as pltfrom itertools import cycleimport sysfrom sklearn import svm, datasetsfrom sklearn.metrics import roc_curve, aucfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import label_binarizefrom sklearn.multiclass import OneVsRestClassifierfrom scipy import interpfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.naive_bayes import MultinomialNB# Import some data to play withdf = pd.read_csv("E:\\autodesk\\TTI ROC curve.csv")X =df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].valuesy = df['TTI_Category'].as_matrix()# Binarize the outputy = label_binarize(y, classes=['Good','Bad'])n_classes = y.shape[1]# shuffle and split training and test setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0)# Learn to predict each class against the otherclassifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True, random_state=random_state))y_score = classifier.fit(X_train, y_train).decision_function(X_test)# Compute ROC curve and ROC area for each classfpr = dict()tpr = dict()roc_auc = dict()for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i]) roc_auc[i] = auc(fpr[i], tpr[i])# Compute micro-average ROC curve and ROC areafpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])plt.figure()lw = 2plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')plt.xlim([0.0, 1.0])plt.ylim([0.0, 1.05])plt.xlabel('False Positive Rate')plt.ylabel('True Positive Rate')plt.title('Receiver operating characteristic example')plt.legend(loc="lower right")plt.show()enter code here
如果我运行这段代码,系统告诉我random_state未定义。所以我将其改为random_state=true。然后系统告诉我
plt.plot(fpr[2], tpr[2], color='darkorange', KeyError: 2 <matplotlib.figure.Figure at 0xd8bff60>
如果我打印出n_classes,系统告诉我它是”1″,如果我打印出文档中的n_classes,它显示为3。我不确定问题是否出在这里。有人知道这个错误跟踪的答案吗?
回答:
看起来你只是不明白你的数据是如何结构化的,以及你的代码应该如何工作。
LabelBinarizer
会返回一个一对多的编码,这意味着对于两个类别,你会得到以下映射:['good', 'bad', 'good'] -> [[1], [0], [1]]
,因此n_classes = 1
。
如果你有两个类别,为什么你会期望它是3呢?只需将plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
改为plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0])
,你应该就能解决问题了。