我尝试为这个KNN算法添加一个混淆矩阵。由于算法本身已经非常复杂,使用了嵌套交叉验证和网格搜索最佳参数,我不知道在哪里添加混淆矩阵部分。
print(__doc__)# Number of random trialsNUM_TRIALS = 30# Load the datasetX_iris = X.valuesy_iris = y# Set up possible values of parameters to optimize overp_grid = {"n_neighbors": [1, 5, 10, 15]}# We will use a Support Vector Classifier with "rbf" kernelsvm = KNeighborsClassifier()# Arrays to store scoresnon_nested_scores = np.zeros(NUM_TRIALS)nested_scores = np.zeros(NUM_TRIALS)# Loop for each trialfor i in range(NUM_TRIALS): # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "GroupKFold", "LeaveOneOut", "LeaveOneGroupOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) # Non_nested parameter search and scoring clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv) clf.fit(X_iris, y_iris) non_nested_scores[i] = clf.best_score_ # Nested CV with parameter optimization nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv) nested_scores[i] = nested_score.mean()score_difference = non_nested_scores - nested_scoresprint("Average difference of {:6f} with std. dev. of {:6f}." .format(score_difference.mean(), score_difference.std()))# Plot scores on each trial for nested and non-nested CVplt.figure()plt.subplot(211)non_nested_scores_line, = plt.plot(non_nested_scores, color='r')nested_line, = plt.plot(nested_scores, color='b')plt.ylabel("score", fontsize="14")plt.legend([non_nested_scores_line, nested_line], ["Non-Nested CV", "Nested CV"], bbox_to_anchor=(0, .4, .5, 0))plt.title("Non-Nested and Nested Cross Validation on Touch Classification Data set KNN", x=.5, y=1.1, fontsize="15")# Plot bar chart of the difference.plt.subplot(212)difference_plot = plt.bar(range(NUM_TRIALS), score_difference)plt.xlabel("Individual Trial #")plt.legend([difference_plot], ["Non-Nested CV - Nested CV Score"], bbox_to_anchor=(0, 1, .8, 0))plt.ylabel("score difference", fontsize="14")
我尝试添加以下部分来为我的算法添加混淆矩阵:
cm = confusion_matrix(y_test, preds)tn, fp, fn, tp = confusion_matrix(y_test, preds).ravel()cm = [[tp,fp],[fn,tn]]ax= plt.subplot()sns.heatmap(cm, annot=True, fmt = "d", cmap="Spectral"); #annot=True to annotate cells# labels, title and ticksax.set_xlabel('ACTUAL LABELS');ax.set_ylabel('PREDICTED LABELS'); ax.set_title('KNN Confusion Matrix'); ax.xaxis.set_ticklabels(['11', '12','13','21','22','23','31','32','33']); ax.yaxis.set_ticklabels(['Soft', 'Tough']);
我对混淆矩阵的算法理解还不够透彻,不确定如何正确地将其实现到我的KNN算法中。在我的数据集中,我有
y = ['11', '12','13','21','22','23','31','32','33'] #my labels
Duration Grand Mean Max Mean Activation0 64 136.772461 178.5937501 67 193.445196 258.5156252 67 112.382929 145.7656253 88 156.530717 238.734375#head of my feature matrix
回答:
你首先需要使用GridSearchCV的最佳估计器进行预测。
preds=clf.best_estimator_.predict(X_test)
然后使用sklearn.metrics中的confusion_matrix函数打印混淆矩阵
from sklearn.metrics import confusion_matrixprint confusion_matrix(y_test, preds)
一旦你得到了混淆矩阵,你就可以绘制它。编辑:由于你没有单独的测试数据,你将在X_iris上进行测试。但最好还是将数据分开。有个类似的提问在Sci-kit: What’s the easiest way to get the confusion matrix of an estimator when using GridSearchCV?