我正在使用机器学习分类技术,包括随机森林和梯度提升:
以下是运行正常的随机森林代码:
from sklearn.ensemble import RandomForestClassifiermodel = RandomForestClassifier(n_estimators=100, min_samples_leaf=10, random_state=1)model.fit(x_train, y_train)print(model.score)#预测的准确性y_pred = model.predict(x_test)#均方误差mean_squared_error(y_pred, y_test)model.score(x_test, y_test)Out[423]: 0.80038542832276516
现在是第二个分类器,梯度提升,它会产生错误:
from sklearn.ensemble import GradientBoostingClassifier #用于分类clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1)clf.fit(x_train, y_train)
在这里会出现以下错误:
clf.fit(x_train, y_train)Traceback (most recent call last):File "<ipython-input-425-9249b506d83f>", line 1, in <module>clf.fit(x_train, y_train)File "C:\Anaconda3\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 973, in fitX, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)File "C:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 526, in check_X_yy = column_or_1d(y, warn=True)File "C:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 562, in column_or_1draise ValueError("bad input shape {0}".format(shape))ValueError: bad input shape (37533, 3)
数据如下所示:
print(x_train) No Yes32912 1.0 0.035665 1.0 0.032436 1.0 0.025885 1.0 0.024896 1.0 0.051734 1.0 0.04235 1.0 0.051171 1.0 0.033221 0.0 1.0print(y_train) Fatal Incident Non-Fatal32912 0.0 0.0 1.035665 0.0 0.0 1.032436 0.0 0.0 1.0
你能告诉我梯度提升的fit()函数为什么会出现错误:ValueError: bad input shape (37533, 3)吗?
回答:
尝试不进行标签二值化处理…