我尝试在scikit-learn中使用GradientBoostingClassifier,默认参数下运行良好。然而,当我尝试用不同的分类器替换BaseEstimator时,它无法工作并给出了以下错误,
return y - np.nan_to_num(np.exp(pred[:, k] -IndexError: too many indices
你有解决这个问题的办法吗?
可以使用以下代码片段重现此错误:
import numpy as npfrom sklearn import datasetsfrom sklearn.ensemble import GradientBoostingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.utils import shufflemnist = datasets.fetch_mldata('MNIST original')X, y = shuffle(mnist.data, mnist.target, random_state=13)X = X.astype(np.float32)offset = int(X.shape[0] * 0.01)X_train, y_train = X[:offset], y[:offset]X_test, y_test = X[offset:], y[offset:]### works fine when init is Noneclf_init = Noneprint 'Train with clf_init = None'clf = GradientBoostingClassifier( (loss='deviance', learning_rate=0.1, n_estimators=5, subsample=0.3, min_samples_split=2, min_samples_leaf=1, max_depth=3, init=clf_init, random_state=None, max_features=None, verbose=2, learn_rate=None)clf.fit(X_train, y_train)print 'Train with clf_init = None is done :-)'print 'Train LogisticRegression()'clf_init = LogisticRegression();clf_init.fit(X_train, y_train);print 'Train LogisticRegression() is done'print 'Train with clf_init = LogisticRegression()'clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=5, subsample=0.3, min_samples_split=2, min_samples_leaf=1, max_depth=3, init=clf_init, random_state=None, max_features=None, verbose=2, learn_rate=None) clf.fit(X_train, y_train) # <------ ERROR!!!! print 'Train with clf_init = LogisticRegression() is done'
这是错误的完整追溯信息:
Traceback (most recent call last):File "/home/mohsena/Dropbox/programing/gbm/gb_with_init.py", line 56, in <module> clf.fit(X_train, y_train)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 862, in fit return super(GradientBoostingClassifier, self).fit(X, y)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 614, in fit random_state)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 475, in _fit_stage residual = loss.negative_gradient(y, y_pred, k=k)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 404, in negative_gradient return y - np.nan_to_num(np.exp(pred[:, k] - IndexError: too many indices
回答:
正如scikit-learn开发者所建议的,可以通过使用如下适配器来解决这个问题:
def __init__(self, est): self.est = estdef predict(self, X): return self.est.predict_proba(X)[:, 1]def fit(self, X, y): self.est.fit(X, y)