在scikit-learn中使用带有BaseEstimator的GradientBoostingClassifier?

我尝试在scikit-learn中使用GradientBoostingClassifier,默认参数下运行良好。然而,当我尝试用不同的分类器替换BaseEstimator时,它无法工作并给出了以下错误,

return y - np.nan_to_num(np.exp(pred[:, k] -IndexError: too many indices

你有解决这个问题的办法吗?

可以使用以下代码片段重现此错误:

import numpy as npfrom sklearn import datasetsfrom sklearn.ensemble import GradientBoostingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.utils import shufflemnist = datasets.fetch_mldata('MNIST original')X, y = shuffle(mnist.data, mnist.target, random_state=13)X = X.astype(np.float32)offset = int(X.shape[0] * 0.01)X_train, y_train = X[:offset], y[:offset]X_test, y_test = X[offset:], y[offset:]### works fine when init is Noneclf_init = Noneprint 'Train with clf_init = None'clf = GradientBoostingClassifier( (loss='deviance', learning_rate=0.1,                             n_estimators=5, subsample=0.3,                             min_samples_split=2,                             min_samples_leaf=1,                             max_depth=3,                             init=clf_init,                             random_state=None,                             max_features=None,                             verbose=2,                             learn_rate=None)clf.fit(X_train, y_train)print 'Train with clf_init = None is done :-)'print 'Train LogisticRegression()'clf_init = LogisticRegression();clf_init.fit(X_train, y_train);print 'Train LogisticRegression() is done'print 'Train with clf_init = LogisticRegression()'clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.1,                             n_estimators=5, subsample=0.3,                             min_samples_split=2,                             min_samples_leaf=1,                             max_depth=3,                             init=clf_init,                             random_state=None,                             max_features=None,                             verbose=2,                             learn_rate=None) clf.fit(X_train, y_train) # <------ ERROR!!!! print 'Train with clf_init = LogisticRegression() is done'

这是错误的完整追溯信息:

Traceback (most recent call last):File "/home/mohsena/Dropbox/programing/gbm/gb_with_init.py", line 56, in <module>   clf.fit(X_train, y_train)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 862, in fit   return super(GradientBoostingClassifier, self).fit(X, y)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 614, in fit random_state)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 475, in _fit_stage   residual = loss.negative_gradient(y, y_pred, k=k)File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 404, in negative_gradient   return y - np.nan_to_num(np.exp(pred[:, k] -   IndexError: too many indices

回答:

正如scikit-learn开发者所建议的,可以通过使用如下适配器来解决这个问题:

def __init__(self, est):   self.est = estdef predict(self, X):    return self.est.predict_proba(X)[:, 1]def fit(self, X, y):    self.est.fit(X, y)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注