分类和绘制的数据点数量与数据集中的点数不匹配

我正在使用一个包含54个数据点的Python数据集，通过k-NN分类器进行分类，邻居数量设置为20。我的代码完成了分类并绘制了结果，但我在散点图中只看到了22个数据点，而不是54个被分类的数据点。

在机器学习中，是否有原因导致不是所有数据点都被分类和绘制？

选择的邻居数量是否会影响被分类和绘制的数据点数量？谢谢。

import numpy as npimport matplotlib.pyplot as pltfrom matplotlib.colors import ListedColormapfrom sklearn import neighbors, datasetsimport pandas as pdfrom sklearn import preprocessing# Preprocessing of dataset done here.n_neighbors = 20dataset = pd.read_csv('cereal.csv')X = dataset.iloc[:, [3,5]].valuesy = dataset.iloc[:, 1].valuesy_set = preprocessing.LabelEncoder()y_fit = y_set.fit(y)y_trans = y_set.transform(y)# sorting dataset done here.Total number of data points :77 but 54 will # be selected to usej = 0for i in range (0,77):if y[i] == 'K' or y[i] == 'G' or y[i] == 'P':    j = j+1new_data = np.zeros((j,2))new_let = [0] * jj = 0for i in range (0,77):if y[i] == 'K' or y[i] == 'G' or y[i] == 'P':    new_data[j] = X[i]    new_let[j] = y[i]    j = j+1# Plotting and setting up mesh grid done hereh = .02cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])for weights in ['uniform', 'distance']:# we create an instance of Neighbours Cylassifier and fit the data.clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)clf.fit(X, y_trans)# Plot the decision boundary. For that, we will assign a color to each# point in the mesh [x_min, x_max]x[y_min, y_max].x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, h),                     np.arange(y_min, y_max, h))Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])# Put the result into a color plotZ = Z.reshape(xx.shape)plt.figure()plt.pcolormesh(xx, yy, Z, cmap=cmap_light)plt.scatter(X[:, 0], X[:, 1], c=y_trans, cmap=cmap_bold,            edgecolor='k', s=20)plt.xlim(xx.min(), xx.max())plt.ylim(yy.min(), yy.max())plt.title("3-Class classification (k = %i, weights = '%s')"          % (n_neighbors, weights))plt.show()

回答：

首先，你在分类器和绘图中使用了数据集的所有77个点。你创建的包含54个点的变量既没有用于拟合分类器，也没有用于生成最终的图表。

运行脚本后，你应该检查Anaconda中的变量浏览器，以查看你使用的不同变量的大小。

关于你生成的图表，如果你观察数据的分布方式，你会明白为什么只看到22个点：

Cereal K-NN

如果你查看原始数据集，会发现有几个点在这两个列（脂肪和卡路里）上具有重复的值。因此，在图表上，这些点重叠在一起，所以虽然你绘制了77个点，但你只能“看到”其中的22个。如果你想看到所有点都清晰分开，你可能需要选择其他属性。

学技术

分类和绘制的数据点数量与数据集中的点数不匹配

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复