数据来自’copper-new.txt’:’https://storage.googleapis.com/aipi_datasets/copper-new.txt
我正在研究一个数据集,该数据集显示了铜的热膨胀系数随温度的变化情况。我试图建立一个模型来描述热膨胀系数与温度的关系,以便能够预测任何新温度值下的系数。我们将使用线性回归模型
# 需要将数据分割成列,因为Pandas没有为我们自动分割copperdata['X'] = copperdata.apply(lambda x: x.str.split()[0][1],axis=1)copperdata['y'] = copperdata.apply(lambda x: x.str.split()[0][0],axis=1)copperdata = copperdata[['X','y']].astype(float)copperdata.head()
现在我需要完成下面的函数lin_model(),它以数据框(copperdata)作为输入,然后执行以下操作:为此,我需要为X(输入特征 – 温度)和y(目标 – 热膨胀系数)创建NumPy数组。你需要将X数组重塑为二维数组,其中第二个维度为1,以便它可以作为scikit-learn模型的输入。将你的数据分割成训练、测试和验证集。使用总数据的10%作为测试集。在剩余的90%中,使用80%作为训练集,20%作为验证集。确保在分割数据时设置random_state=0。训练一个线性回归模型,然后在验证集上计算MAE。你的函数应该返回1) 训练好的模型和2) 验证集的MAE
def lin_model(df): # YOUR CODE HERE X = copperdata['X'] np.asarray('X') y = copperdata['y'] np.asarray('y') X.values.reshape(-1, 1) y.values.reshape(-1, 1) X_train_full,X_test,_y_train_full,y_test = train_test_split(X,y,random_state=0,test_size = .10) X_train,X_val,y_train,y_val = train_test_split(X_train_full,y_train_full,random_state = 0, test_size = .20) model = LinearRegression() model.fit(X_train,y_train) val_preds = model.predict(X_val) mae = metrics.mean_absolute_error(y_test,y_pred) return model, mae raise NotImplementedError()
在尝试通过这个测试单元时:
model,mae = lin_model(copperdata)
出现了以下错误:
---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-112-8931ac554101> in <module> 1 # Test cell----> 2 model,mae = lin_model(copperdata) 3 4 # Print model coefficients and intercept 5 m = model.coef_<ipython-input-111-ba2035c8e718> in lin_model(df) 6 model.fit(X_train,y_train) 7 #val_preds = model.predict(X_val)----> 8 y_preds = model.predict(X_test) 9 mae = metrics.mean_absolute_error(y_test,y_pred) 10 return model, mae/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_base.py in predict(self, X) 234 Returns predicted values. 235 """--> 236 return self._decision_function(X) 237 238 _preprocess_data = staticmethod(_preprocess_data)/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_base.py in _decision_function(self, X) 216 check_is_fitted(self) 217 --> 218 X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) 219 return safe_sparse_dot(X, self.coef_.T, 220 dense_output=True) + self.intercept_/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 70 FutureWarning) 71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})---> 72 return f(**kwargs) 73 return inner_f 74 /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 617 # If input is 1D raise error 618 if array.ndim == 1:--> 619 raise ValueError( 620 "Expected 2D array, got 1D array instead:\narray={}.\n" 621 "Reshape your data either using array.reshape(-1, 1) if "ValueError: Expected 2D array, got 1D array instead:array=[656.2 544.47 524.7 60.41 447.41 89.57].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
回答:
你只需要重塑解释变量X,并且覆盖变量X = ...
,否则reshape
操作不会被保存到X中。
def lin_model(df): # YOUR CODE HERE X = copperdata['X'] y = copperdata['y'] X = X.values.reshape(-1, 1) X_train_full,X_test,_y_train_full,y_test = train_test_split(X,y,random_state=0,test_size = .10) model = LinearRegression() model.fit(X_train,y_train) val_preds = model.predict(X_val) mae = metrics.mean_absolute_error(y_test,y_pred) return model, mae