在 Jupyter Notebook 中加载数据集

我在尝试在 Jupyter Notebook 中下载并加载一个数据集时遇到了问题,以下是我的代码:

import osimport tarfilefrom six.moves import urllibDOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"HOUSING_PATH = os.path.join("datasets", "housing")HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):    if not os.path.isdir(housing_path):        os.makedirs(housing_path)    tgz_path = os.path.join(housing_path, "housing.tgz")    urllib.request.urlretrieve(housing_url, tgz_path)    housing_tgz = tarfile.open(tgz_path)    housing_tgz.extractall(path=housing_path)    housing_tgz.close()import pandas as pddef load_housing_data(housing_path=HOUSING_PATH):    csv_path = os.path.join(housing_path, "housing.csv")    return pd.read_csv(csv_path)housing = load_housing_data()housing.head()

运行上述代码后,我得到了以下错误:

---------------------------------------------------------------------------FileNotFoundError                         Traceback (most recent call last)<ipython-input-5-6a9011700846> in <module>----> 1 housing = load_housing_data()      2 housing.head()<ipython-input-4-4d0bff7b3608> in load_housing_data(housing_path)      2 def load_housing_data(housing_path=HOUSING_PATH):      3     csv_path = os.path.join(housing_path, "housing.csv")----> 4     return pd.read_csv(csv_path)~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)    676                     skip_blank_lines=skip_blank_lines)    677 --> 678         return _read(filepath_or_buffer, kwds)    679     680     parser_f.__name__ = name~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)    438     439     # Create the parser.--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)    441     442     if chunksize or iterator:~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)    785             self.options['has_index_names'] = kwds['has_index_names']    786 --> 787         self._make_engine(self.engine)    788     789     def close(self):~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)   1012     def _make_engine(self, engine='c'):   1013         if engine == 'c':-> 1014             self._engine = CParserWrapper(self.f, **self.options)   1015         else:   1016             if engine == 'python':~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)   1706         kwds['usecols'] = self.usecols   1707 -> 1708         self._reader = parsers.TextReader(src, **kwds)   1709    1710         passed_names = self.names is Nonepandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()FileNotFoundError: File b'datasets/housing/housing.csv' does not exist

我尝试手动下载数据并将 .CSV 文件添加到同一文件夹中,使用以下代码可以正常工作:

import pandas as pdimport numpy as nppd.read_csv('housing.csv', delimiter = ',')

我的问题是第一个代码段出了什么问题?我非常感谢任何人能解释一下。顺便说一下,我使用的是 Mac 10.14 系统。

注意:这段代码是来自《Hands on Machine Learning with Scikit Learn and Tensorflow》一书的示例。


回答:

def fetch_housing_data() 函数没有被调用,因此没有目录或下载的文件。你需要在 def load_housing_data 函数体内调用 fetch_housing_data() 函数

就像这样:

def load_housing_data(housing_path=HOUSING_PATH):    # 缺少调用获取数据的函数    fetch_housing_data()    csv_path = os.path.join(housing_path, "housing.csv")    return pd.read_csv(csv_path)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注