我想用k-nn分类器的不同k值绘制图形。我的问题是图形似乎使用了相同的k值。到目前为止,我尝试在循环中每次运行时更改k的值:
clf = KNeighborsClassifier(n_neighbors=counter+1)
但是所有图形似乎都是k=1
from sklearn.datasets import fetch_california_housingdata = fetch_california_housing()import numpy as npfrom sklearn.model_selection import train_test_splitc = np.array([1 if y > np.median(data['target']) else 0 for y in data['target']])X_train, X_test, c_train, c_test = train_test_split(data['data'], c, random_state=0)from sklearn.neighbors import KNeighborsClassifierimport mglearnimport matplotlib.pyplot as pltfig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))for counter in range(3): clf = KNeighborsClassifier(n_neighbors=counter+1) clf.fit(X_test, c_test) plt.tight_layout() # this will help create proper spacing between the plots. mglearn.discrete_scatter(X_test[:,0], X_test[:,1], c_test, ax=ax[counter]) plt.legend(["Class 0", "Class 1"], loc=4) plt.xlabel("First feature") plt.ylabel("Second feature") #plt.figure()
回答:
所有图形看起来相同的原因为您每次都在绘制测试集,而不是绘制模型在测试集上的预测。您可能希望对每个k
值执行以下操作:
-
将模型拟合到训练集上,在这种情况下,您应将
clf.fit(X_test, c_test)
替换为clf.fit(X_train, c_train)
。 -
在测试集上生成模型预测,在这种情况下,您应添加
c_pred = clf.predict(X_test)
。 -
绘制模型在测试集上的预测,在这种情况下,您应在散点图中将
c_test
替换为c_pred
,即使用mglearn.discrete_scatter(X_test[:, 0], X_test[:, 1], c_pred, ax=ax[counter])
替代mglearn.discrete_scatter(X_test[:, 0], X_test[:, 1], c_test, ax=ax[counter])
。
更新后的代码:
from sklearn.datasets import fetch_california_housingfrom sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsClassifierimport numpy as npimport mglearnimport matplotlib.pyplot as pltdata = fetch_california_housing()c = np.array([1 if y > np.median(data['target']) else 0 for y in data['target']])X_train, X_test, c_train, c_test = train_test_split(data['data'], c, random_state=0)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))for counter in range(3): clf = KNeighborsClassifier(n_neighbors=counter+1) # fit the model to the training set clf.fit(X_train, c_train) # extract the model predictions on the test set c_pred = clf.predict(X_test) # plot the model predictions plt.tight_layout() mglearn.discrete_scatter(X_test[:,0], X_test[:,1], c_pred, ax=ax[counter]) plt.legend(["Class 0", "Class 1"], loc=4) plt.xlabel("First feature") plt.ylabel("Second feature")