当我使用以下代码加载MNIST数据时出现了错误。(已安装anaconda,并在线上Jupyter notebook中编写代码。)
from sklearn.datasets import fetch_mldatamnist = fetch_mldata('MNIST original')
出现了超时错误,我不知道哪里出了问题。我已经关闭了VPN代理,但还是没有解决。请帮助我!
TimeoutError Traceback (most recent call last)<ipython-input-1-3ba7b9c02a3b> in <module>() 1 from sklearn.datasets import fetch_mldata----> 2 mnist = fetch_mldata('MNIST original')~\Anaconda3\lib\site-packages\sklearn\datasets\mldata.py in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home) 152 urlname = MLDATA_BASE_URL % quote(dataname) 153 try:--> 154 mldata_url = urlopen(urlname) 155 except HTTPError as e: 156 if e.code == 404:~\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context) 221 else: 222 opener = _opener--> 223 return opener.open(url, data, timeout) 224 225 def install_opener(opener):~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout) 524 req = meth(req) 525 --> 526 response = self._open(req, data) 527 528 # post-process response~\Anaconda3\lib\urllib\request.py in _open(self, req, data) 542 protocol = req.type 543 result = self._call_chain(self.handle_open, protocol, protocol +--> 544 '_open', req) 545 if result: 546 return result~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args) 502 for handler in handlers: 503 func = getattr(handler, meth_name)--> 504 result = func(*args) 505 if result is not None: 506 return result~\Anaconda3\lib\urllib\request.py in http_open(self, req) 1344 1345 def http_open(self, req):-> 1346 return self.do_open(http.client.HTTPConnection, req) 1347 1348 http_request = AbstractHTTPHandler.do_request_~\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args) 1319 except OSError as err: # timeout error 1320 raise URLError(err)-> 1321 r = h.getresponse() 1322 except: 1323 h.close()~\Anaconda3\lib\http\client.py in getresponse(self) 1329 try: 1330 try:-> 1331 response.begin() 1332 except ConnectionError: 1333 self.close()~\Anaconda3\lib\http\client.py in begin(self) 295 # read until we get a non-100 response 296 while True:--> 297 version, status, reason = self._read_status() 298 if status != CONTINUE: 299 break~\Anaconda3\lib\http\client.py in _read_status(self) 256 257 def _read_status(self):--> 258 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") 259 if len(line) > _MAXLINE: 260 raise LineTooLong("status line")~\Anaconda3\lib\socket.py in readinto(self, b) 584 while True: 585 try:--> 586 return self._sock.recv_into(b) 587 except timeout: 588 self._timeout_occurred = TrueTimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
我下载了MNIST数据集并尝试自己加载数据。我复制了加载MNIST的代码,但再次失败了。我认为我需要修改一些代码而不是完全复制网络上的代码,但我不知道应该在哪里进行修改。(我是Python初学者)我用来加载下载的MNIST数据的代码。是不是因为我把数据放错了位置?
def loadmnist(imagefile, labelfile): # Open the images with gzip in read binary mode images = open(imagefile, 'rb') labels = open(labelfile, 'rb') # Get metadata for images images.read(4) # skip the magic_number number_of_images = images.read(4) number_of_images = unpack('>I', number_of_images)[0] rows = images.read(4) rows = unpack('>I', rows)[0] cols = images.read(4) cols = unpack('>I', cols)[0] # Get metadata for labels labels.read(4) N = labels.read(4) N = unpack('>I', N)[0] # Get data x = np.zeros((N, rows*cols), dtype=np.uint8) # Initialize numpy array y = np.zeros(N, dtype=np.uint8) # Initialize numpy array for i in range(N): for j in range(rows*cols): tmp_pixel = images.read(1) # Just a single byte tmp_pixel = unpack('>B', tmp_pixel)[0] x[i][j] = tmp_pixel tmp_label = labels.read(1) y[i] = unpack('>B', tmp_label)[0] images.close() labels.close() return (x, y)
以上部分没问题。
train_img, train_lbl = loadmnist('data/train-images-idx3-ubyte' , 'data/train-labels-idx1-ubyte')test_img, test_lbl = loadmnist('data/t10k-images-idx3-ubyte' , 'data/t10k-labels-idx1-ubyte')
错误信息如下。
FileNotFoundError Traceback (most recent call last)<ipython-input-5-b23a5078b5bb> in <module>() 1 train_img, train_lbl = loadmnist('data/train-images-idx3-ubyte'----> 2 , 'data/train-labels-idx1-ubyte') 3 test_img, test_lbl = loadmnist('data/t10k-images-idx3-ubyte' 4 , 'data/t10k-labels-idx1-ubyte')<ipython-input-4-967098b85f28> in loadmnist(imagefile, labelfile) 2 3 # Open the images with gzip in read binary mode----> 4 images = open(imagefile, 'rb') 5 labels = open(labelfile, 'rb') 6 FileNotFoundError: [Errno 2] No such file or directory: 'data/train-images-idx3-ubyte'
我下载的数据被放到了我刚创建的一个文件夹中。查看图片说明
回答:
如果你想直接从某个库中加载数据集,而不是先下载再加载,可以从Keras中加载数据集。
操作方法如下:
from keras.datasets import mnist(X_train, y_train), (X_test, y_test) = mnist.load_data()
如果你是对机器学习和Python感兴趣的初学者,想要了解更多,我推荐你查看这篇优秀的博客文章。
另外,向函数传递文件时,也需要包含文件扩展名。即你需要这样调用函数:
train_img, train_lbl = loadmnist('mnist//train-images-idx3-ubyte.gz' , 'mnist//train-labels-idx1-ubyte.gz')test_img, test_lbl = loadmnist('mnist//t10k-images-idx3-ubyte.gz' , 'mnist//t10k-labels-idx1-ubyte.gz')
在你用来从本地磁盘加载数据的代码中,出现错误是因为文件不在指定的位置。请确保你的notebook所在的文件夹中存在mnist文件夹。