这是代码
当我运行它时,我得到:
Section Longitude Latitude ... Alkalinity pHSWS25 TCO20 06GA19960613 64.87 81.38 ... 2236.3 7.79776 2056.61 06GA19960613 64.87 81.38 ... 2234.4 7.78997 2068.42 06GA19960613 64.87 81.38 ... 2247.1 7.74140 2104.13 06GA19960613 64.87 81.38 ... 2254.1 7.71428 2120.54 06GA19960613 64.87 81.38 ... 2270.4 7.69494 2131.7[5 rows x 18 columns]('\nShape of training data :', (87099, 18))('\nShape of testing data :', (171921, 18))//////////////////////// Section Longitude Latitude ... Phosphate Alkalinity TCO20 06GA19960613 64.87 81.38 ... 0.214634 2236.3 2056.61 06GA19960613 64.87 81.38 ... 0.253659 2234.4 2068.42 06GA19960613 64.87 81.38 ... 0.390244 2247.1 2104.13 06GA19960613 64.87 81.38 ... 0.536585 2254.1 2120.54 06GA19960613 64.87 81.38 ... 0.595122 2270.4 2131.7 [5 rows x 17 columns]0 7.797761 7.789972 7.741403 7.714284 7.69494
错误:
Name: pHSWS25, dtype: float64Traceback (most recent call last): File "ocean_data.py", line 60, in <module> LinearRegression().fit(train_x,train_y) File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 458, in fit y_numeric=True, multi_output=True) File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 756, in check_X_y estimator=estimator) File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 567, in check_array array = array.astype(np.float64)ValueError: invalid literal for float(): 06GA19960613
有谁能帮助解决这个问题吗?
回答:
线性回归只接受数值特征,如果你运行: train_data.dtypes
你可能会得到:
section objectLongitude float
你必须转换它,或者使用不同的回归类型。
一种转换文件的方法称为编码:
from sklearn.preprocessing import OneHotEncoderenc = OneHotEncoder()enc.fit(train.data['Section']() train.data['Section'] =enc.transform(train.data['Section']).toarray()test.data['Section'] = enc.transform(test.data['Section']).toarray()
仅作为起点,如果现在出现形状错误,你需要稍微调整一下数据格式…