我在Kaggle上学习机器学习教程时,按照教程逐行操作,但仍然遇到了ValueError
。我正在尝试练习数据验证和拆分。这是我的代码:
import pandas as pdfrom sklearn.tree import DecisionTreeRegressorfrom sklearn.model_selection import train_test_splitmain_file_path = '../input/train.csv'data = pd.read_csv(main_file_path)y = data.SalePricedata_predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']x = data[data_predictors]train_x, val_x, train_y, val_x = train_test_split(x, y,random_state = 0)data_model = DecisionTreeRegressor()data_model.fit(train_x,train_y)data_prediction = data_model.predict(val_x)print(mean_absolute_error(val_y, data_prediction))
错误指向这一行:
data_prediction = data_model.predict(val_x)
我是一个机器学习的初学者,所以我将我的代码与作者的代码进行了比较,发现实现是相同的。
完整的堆栈跟踪:
---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-3-48f37072f996> in <module>() 17 data_model.fit(train_x,train_y) 18 ---> 19 data_prediction = data_model.predict(val_x) 20 print(mean_absolute_error(val_y, data_prediction))/opt/conda/lib/python3.6/site-packages/sklearn/tree/tree.py in predict(self, X, check_input) 410 """ 411 check_is_fitted(self, 'tree_')--> 412 X = self._validate_X_predict(X, check_input) 413 proba = self.tree_.predict(X) 414 n_samples = X.shape[0]/opt/conda/lib/python3.6/site-packages/sklearn/tree/tree.py in _validate_X_predict(self, X, check_input) 371 """Validate X whenever one tries to predict, apply, predict_proba""" 372 if check_input:--> 373 X = check_array(X, dtype=DTYPE, accept_sparse="csr") 374 if issparse(X) and (X.indices.dtype != np.intc or 375 X.indptr.dtype != np.intc):/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 439 "Reshape your data either using array.reshape(-1, 1) if " 440 "your data has a single feature or array.reshape(1, -1) "--> 441 "if it contains a single sample.".format(array)) 442 array = np.atleast_2d(array) 443 # To ensure that array flags are maintainedValueError: Expected 2D array, got 1D array instead:
回答:
虽然错误出现在你指出的那行,但实际问题出在这行:
train_x, val_x, train_y, val_x = train_test_split(x, y,random_state = 0)
请注意,你在这行代码中有两个val_x
。第二个val_x
应该是val_y
。发生的情况是,你将val_x
(应该是一个二维的输入数组)设置成了本应是y
值的一维预测数组,从而导致ValueError,提示你输入了一维数组而期望的是二维数组。