如何在Python中为KNN算法添加混淆矩阵?

我尝试为这个KNN算法添加一个混淆矩阵。由于算法本身已经非常复杂,使用了嵌套交叉验证和网格搜索最佳参数,我不知道在哪里添加混淆矩阵部分。

print(__doc__)# Number of random trialsNUM_TRIALS = 30# Load the datasetX_iris = X.valuesy_iris = y# Set up possible values of parameters to optimize overp_grid = {"n_neighbors": [1, 5, 10, 15]}# We will use a Support Vector Classifier with "rbf" kernelsvm = KNeighborsClassifier()# Arrays to store scoresnon_nested_scores = np.zeros(NUM_TRIALS)nested_scores = np.zeros(NUM_TRIALS)# Loop for each trialfor i in range(NUM_TRIALS):    # Choose cross-validation techniques for the inner and outer loops,    # independently of the dataset.    # E.g "GroupKFold", "LeaveOneOut", "LeaveOneGroupOut", etc.    inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)    outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)    # Non_nested parameter search and scoring    clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)    clf.fit(X_iris, y_iris)    non_nested_scores[i] = clf.best_score_    # Nested CV with parameter optimization    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)    nested_scores[i] = nested_score.mean()score_difference = non_nested_scores - nested_scoresprint("Average difference of {:6f} with std. dev. of {:6f}."      .format(score_difference.mean(), score_difference.std()))# Plot scores on each trial for nested and non-nested CVplt.figure()plt.subplot(211)non_nested_scores_line, = plt.plot(non_nested_scores, color='r')nested_line, = plt.plot(nested_scores, color='b')plt.ylabel("score", fontsize="14")plt.legend([non_nested_scores_line, nested_line],           ["Non-Nested CV", "Nested CV"],           bbox_to_anchor=(0, .4, .5, 0))plt.title("Non-Nested and Nested Cross Validation on Touch Classification Data set KNN",          x=.5, y=1.1, fontsize="15")# Plot bar chart of the difference.plt.subplot(212)difference_plot = plt.bar(range(NUM_TRIALS), score_difference)plt.xlabel("Individual Trial #")plt.legend([difference_plot],           ["Non-Nested CV - Nested CV Score"],           bbox_to_anchor=(0, 1, .8, 0))plt.ylabel("score difference", fontsize="14")

我尝试添加以下部分来为我的算法添加混淆矩阵:

cm = confusion_matrix(y_test, preds)tn, fp, fn, tp = confusion_matrix(y_test, preds).ravel()cm = [[tp,fp],[fn,tn]]ax= plt.subplot()sns.heatmap(cm, annot=True, fmt = "d", cmap="Spectral"); #annot=True to annotate cells# labels, title and ticksax.set_xlabel('ACTUAL LABELS');ax.set_ylabel('PREDICTED LABELS'); ax.set_title('KNN Confusion Matrix'); ax.xaxis.set_ticklabels(['11', '12','13','21','22','23','31','32','33']); ax.yaxis.set_ticklabels(['Soft', 'Tough']);

我对混淆矩阵的算法理解还不够透彻,不确定如何正确地将其实现到我的KNN算法中。在我的数据集中,我有

y = ['11', '12','13','21','22','23','31','32','33'] #my labels
      Duration  Grand Mean  Max Mean Activation0           64  136.772461           178.5937501           67  193.445196           258.5156252           67  112.382929           145.7656253           88  156.530717           238.734375#head of my feature matrix

回答:

你首先需要使用GridSearchCV的最佳估计器进行预测。

preds=clf.best_estimator_.predict(X_test)

然后使用sklearn.metrics中的confusion_matrix函数打印混淆矩阵

from sklearn.metrics import confusion_matrixprint confusion_matrix(y_test, preds)

一旦你得到了混淆矩阵,你就可以绘制它。编辑:由于你没有单独的测试数据,你将在X_iris上进行测试。但最好还是将数据分开。有个类似的提问在Sci-kit: What’s the easiest way to get the confusion matrix of an estimator when using GridSearchCV?

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注