我想使用sklearn在MNIST数据集上构建一个手写数字识别模型,并且我想对训练集的特征(x)和标签(y)进行随机打乱。但是出现了KeyError。请告诉我正确的做法是什么。
from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784') x,y=mnist['data'],mnist['target'] x.shape y.shape import matplotlib import matplotlib.pyplot as plt import numpy as np digit = np.array(x.iloc[45]) digit_img = digit.reshape(28,28) plt.imshow(digit_img,cmap=matplotlib.cm.binary , interpolation="nearest") plt.axis("off") y.iloc[45] x_train, x_test = x[:60000],x[60000:] y_train, y_test=y[:60000],y[60000:] import numpy as np shuffled = np.random.permutation(60000) x_train=x_train[shuffled] --> y_train = y_train[shuffled] --> 这两行代码会抛出错误
回答:
请检查type(x_train)
是否为numpy.ndarray或DataFrame。自Scikit-Learn 0.24起,fetch_openml()
默认返回一个Pandas DataFrame
。如果是DataFrame,那么你不能使用x_train[shuffled]
,这是为数组设计的。请改用x_train.iloc[shuffled]