在scikit-learn中使用带有BaseEstimator的GradientBoostingClassifier?

我尝试在scikit-learn中使用GradientBoostingClassifier,默认参数下运行良好。然而,当我尝试用不同的分类器替换BaseEstimator时,它无法工作并给出了以下错误,

return y - np.nan_to_num(np.exp(pred[:, k] -IndexError: too many indices

你有解决这个问题的办法吗?

可以使用以下代码片段重现此错误:

import numpy as npfrom sklearn import datasetsfrom sklearn.ensemble import GradientBoostingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.utils import shufflemnist = datasets.fetch_mldata('MNIST original')X, y = shuffle(mnist.data, mnist.target, random_state=13)X = X.astype(np.float32)offset = int(X.shape[0] * 0.01)X_train, y_train = X[:offset], y[:offset]X_test, y_test = X[offset:], y[offset:]### works fine when init is Noneclf_init = Noneprint 'Train with clf_init = None'clf = GradientBoostingClassifier( (loss='deviance', learning_rate=0.1,                             n_estimators=5, subsample=0.3,                             min_samples_split=2,                             min_samples_leaf=1,                             max_depth=3,                             init=clf_init,                             random_state=None,                             max_features=None,                             verbose=2,                             learn_rate=None)clf.fit(X_train, y_train)print 'Train with clf_init = None is done :-)'print 'Train LogisticRegression()'clf_init = LogisticRegression();clf_init.fit(X_train, y_train);print 'Train LogisticRegression() is done'print 'Train with clf_init = LogisticRegression()'clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.1,                             n_estimators=5, subsample=0.3,                             min_samples_split=2,                             min_samples_leaf=1,                             max_depth=3,                             init=clf_init,                             random_state=None,                             max_features=None,                             verbose=2,                             learn_rate=None) clf.fit(X_train, y_train) # <------ ERROR!!!! print 'Train with clf_init = LogisticRegression() is done'

这是错误的完整追溯信息:

Traceback (most recent call last):File "/home/mohsena/Dropbox/programing/gbm/gb_with_init.py", line 56, in <module>   clf.fit(X_train, y_train)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 862, in fit   return super(GradientBoostingClassifier, self).fit(X, y)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 614, in fit random_state)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 475, in _fit_stage   residual = loss.negative_gradient(y, y_pred, k=k)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 404, in negative_gradient   return y - np.nan_to_num(np.exp(pred[:, k] -   IndexError: too many indices

回答:

正如scikit-learn开发者所建议的,可以通过使用如下适配器来解决这个问题:

def __init__(self, est):   self.est = estdef predict(self, X):    return self.est.predict_proba(X)[:, 1]def fit(self, X, y):    self.est.fit(X, y)

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注