我正在通过一个GitHub链接学习线性回归 “https://github.com/Anubhav1107/Machine_Learning_A-Z/blob/master/Part%202%20-%20Regression/Section%205%20-%20Multiple%20Linear%20Regression/multiple_linear_regression.py“
但当我尝试运行时,出现了以下错误:
ValueError Traceback (most recent call last)<ipython-input-26-860be404cdc9> in <module>() 1 sc_y = StandardScaler()----> 2 y_train = sc_y.fit_transform(y_train)4 frames/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in asarray(a, dtype, order) 536 537 """--> 538 return array(a, dtype, copy=False, order=order) 539 540 ValueError: could not convert string to float: 'Florida'
我在Google Colab上运行这个代码,我已经转换了分类特征,所以我不明白问题出在哪里。
# 编码分类数据from sklearn.preprocessing import LabelEncoder, OneHotEncoderlabelencoder = LabelEncoder()X[:, 3] = labelencoder.fit_transform(X[:, 3])onehotencoder = OneHotEncoder(categorical_features = [3])X = onehotencoder.fit_transform(X).toarray()# 将数据集拆分为训练集和测试集from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)# 特征缩放from sklearn.preprocessing import StandardScalersc_X = StandardScaler()X_train = sc_X.fit_transform(X_train)X_test = sc_X.transform(X_test)sc_y = StandardScaler()y_train = sc_y.fit_transform(y_train)
回答:
这就是为什么在如何创建一个最小的可重现示例中,我们要求以下内容的原因:
确保所有必要的信息都在问题本身中,以便重现问题
而不是在某些外部文件中,你可能正确执行了,也可能没有正确执行这些部分。
我之所以这么说,是因为我无法重现你的错误;执行链接代码的相关部分在这里运行正常:
在这一阶段,我们有:
y_train# 结果:array([ 96778.92, 96479.51, 105733.54, 96712.8 , 124266.9 , 155752.6 , 132602.65, 64926.08, 35673.41, 101004.64, 129917.04, 99937.59, 97427.84, 126992.93, 71498.49, 118474.03, 69758.98, 152211.77, 134307.35, 107404.34, 156991.12, 125370.37, 78239.91, 14681.4 , 191792.06, 141585.52, 89949.14, 108552.04, 156122.51, 108733.99, 90708.19, 111313.02, 122776.86, 149759.96, 81005.76, 49490.75, 182901.99, 192261.83, 42559.73, 65200.33])
我敢打赌,这与你(未显示的)完整代码的情况不符。
稍微修改下面的最后一行至y_train.reshape(-1,1)
(同样,与问题无关 – 如果不这样做,我们会得到一个不同的错误,要求这样做),我们有:
# 特征缩放from sklearn.preprocessing import StandardScalersc_X = StandardScaler()X_train = sc_X.fit_transform(X_train)X_test = sc_X.transform(X_test)sc_y = StandardScaler()y_train = sc_y.fit_transform(y_train.reshape(-1,1)) # 在这里重塑
这样运行正常,得到结果:
y_train# 结果array([[-0.31304376], [-0.32044287], [-0.09175449], [-0.31467774], [ 0.3662475 ], [ 1.14433163], [ 0.57224308], [-1.10020076], [-1.82310158], [-0.20861649], [ 0.50587547], [-0.23498575], [-0.29700745], [ 0.43361398], [-0.93778138], [ 0.22309235], [-0.98076868], [ 1.05682957], [ 0.61437014], [-0.05046517], [ 1.17493831], [ 0.39351679], [-0.77118537], [-2.34186247], [ 2.03494965], [ 0.79423047], [-0.48182335], [-0.02210286], [ 1.15347296], [-0.01760646], [-0.46306547], [ 0.04612731], [ 0.32942519], [ 0.9962397 ], [-0.70283485], [-1.4816433 ], [ 1.81525556], [ 2.04655875], [-1.65292476], [-1.09342341]])