Python: 加载MNIST数据时出现错误

当我使用以下代码加载MNIST数据时出现了错误。(已安装anaconda,并在线上Jupyter notebook中编写代码。)

from sklearn.datasets import fetch_mldatamnist = fetch_mldata('MNIST original')

出现了超时错误,我不知道哪里出了问题。我已经关闭了VPN代理,但还是没有解决。请帮助我!

TimeoutError                              Traceback (most recent call last)<ipython-input-1-3ba7b9c02a3b> in <module>()      1 from sklearn.datasets import fetch_mldata----> 2 mnist = fetch_mldata('MNIST original')~\Anaconda3\lib\site-packages\sklearn\datasets\mldata.py in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)    152         urlname = MLDATA_BASE_URL % quote(dataname)    153         try:--> 154             mldata_url = urlopen(urlname)    155         except HTTPError as e:    156             if e.code == 404:~\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)    221     else:    222         opener = _opener--> 223     return opener.open(url, data, timeout)    224     225 def install_opener(opener):~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)    524             req = meth(req)    525 --> 526         response = self._open(req, data)    527     528         # post-process response~\Anaconda3\lib\urllib\request.py in _open(self, req, data)    542         protocol = req.type    543         result = self._call_chain(self.handle_open, protocol, protocol +--> 544                                   '_open', req)    545         if result:    546             return result~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)    502         for handler in handlers:    503             func = getattr(handler, meth_name)--> 504             result = func(*args)    505             if result is not None:    506                 return result~\Anaconda3\lib\urllib\request.py in http_open(self, req)   1344    1345     def http_open(self, req):-> 1346         return self.do_open(http.client.HTTPConnection, req)   1347    1348     http_request = AbstractHTTPHandler.do_request_~\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)   1319             except OSError as err: # timeout error   1320                 raise URLError(err)-> 1321             r = h.getresponse()   1322         except:   1323             h.close()~\Anaconda3\lib\http\client.py in getresponse(self)   1329         try:   1330             try:-> 1331                 response.begin()   1332             except ConnectionError:   1333                 self.close()~\Anaconda3\lib\http\client.py in begin(self)    295         # read until we get a non-100 response    296         while True:--> 297             version, status, reason = self._read_status()    298             if status != CONTINUE:    299                 break~\Anaconda3\lib\http\client.py in _read_status(self)    256     257     def _read_status(self):--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")    259         if len(line) > _MAXLINE:    260             raise LineTooLong("status line")~\Anaconda3\lib\socket.py in readinto(self, b)    584         while True:    585             try:--> 586                 return self._sock.recv_into(b)    587             except timeout:    588                 self._timeout_occurred = TrueTimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

我下载了MNIST数据集并尝试自己加载数据。我复制了加载MNIST的代码,但再次失败了。我认为我需要修改一些代码而不是完全复制网络上的代码,但我不知道应该在哪里进行修改。(我是Python初学者)我用来加载下载的MNIST数据的代码。是不是因为我把数据放错了位置?

def loadmnist(imagefile, labelfile):    # Open the images with gzip in read binary mode    images = open(imagefile, 'rb')    labels = open(labelfile, 'rb')    # Get metadata for images    images.read(4)  # skip the magic_number    number_of_images = images.read(4)    number_of_images = unpack('>I', number_of_images)[0]    rows = images.read(4)    rows = unpack('>I', rows)[0]    cols = images.read(4)    cols = unpack('>I', cols)[0]    # Get metadata for labels    labels.read(4)    N = labels.read(4)    N = unpack('>I', N)[0]    # Get data    x = np.zeros((N, rows*cols), dtype=np.uint8)  # Initialize numpy array    y = np.zeros(N, dtype=np.uint8)  # Initialize numpy array    for i in range(N):        for j in range(rows*cols):            tmp_pixel = images.read(1)  # Just a single byte            tmp_pixel = unpack('>B', tmp_pixel)[0]            x[i][j] = tmp_pixel        tmp_label = labels.read(1)        y[i] = unpack('>B', tmp_label)[0]    images.close()    labels.close()    return (x, y)

以上部分没问题。

train_img, train_lbl = loadmnist('data/train-images-idx3-ubyte'                                 , 'data/train-labels-idx1-ubyte')test_img, test_lbl = loadmnist('data/t10k-images-idx3-ubyte'                               , 'data/t10k-labels-idx1-ubyte')

错误信息如下。

FileNotFoundError                         Traceback (most recent call last)<ipython-input-5-b23a5078b5bb> in <module>()      1 train_img, train_lbl = loadmnist('data/train-images-idx3-ubyte'----> 2                                  , 'data/train-labels-idx1-ubyte')      3 test_img, test_lbl = loadmnist('data/t10k-images-idx3-ubyte'      4                                , 'data/t10k-labels-idx1-ubyte')<ipython-input-4-967098b85f28> in loadmnist(imagefile, labelfile)      2       3     # Open the images with gzip in read binary mode----> 4     images = open(imagefile, 'rb')      5     labels = open(labelfile, 'rb')      6 FileNotFoundError: [Errno 2] No such file or directory: 'data/train-images-idx3-ubyte'

我下载的数据被放到了我刚创建的一个文件夹中。查看图片说明


回答:

如果你想直接从某个库中加载数据集,而不是先下载再加载,可以从Keras中加载数据集。

操作方法如下:

from keras.datasets import mnist(X_train, y_train), (X_test, y_test) = mnist.load_data()

如果你是对机器学习和Python感兴趣的初学者,想要了解更多,我推荐你查看这篇优秀的博客文章。

另外,向函数传递文件时,也需要包含文件扩展名。即你需要这样调用函数:

train_img, train_lbl = loadmnist('mnist//train-images-idx3-ubyte.gz'                                 , 'mnist//train-labels-idx1-ubyte.gz')test_img, test_lbl = loadmnist('mnist//t10k-images-idx3-ubyte.gz'                               , 'mnist//t10k-labels-idx1-ubyte.gz')

在你用来从本地磁盘加载数据的代码中,出现错误是因为文件不在指定的位置。请确保你的notebook所在的文件夹中存在mnist文件夹。

Related Posts

在使用k近邻算法时,有没有办法获取被使用的“邻居”?

我想找到一种方法来确定在我的knn算法中实际使用了哪些…

Theano在Google Colab上无法启用GPU支持

我在尝试使用Theano库训练一个模型。由于我的电脑内…

准确性评分似乎有误

这里是代码: from sklearn.metrics…

Keras Functional API: “错误检查输入时:期望input_1具有4个维度,但得到形状为(X, Y)的数组”

我在尝试使用Keras的fit_generator来训…

如何使用sklearn.datasets.make_classification在指定范围内生成合成数据?

我想为分类问题创建合成数据。我使用了sklearn.d…

如何处理预测时不在训练集中的标签

已关闭。 此问题与编程或软件开发无关。目前不接受回答。…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注