我试图运行带有校准分类器的XGboost,以下是我遇到错误的代码片段:
from sklearn.calibration import CalibratedClassifierCVfrom xgboost import XGBClassifierimport numpy as npx_train =np.array([1,2,2,3,4,5,6,3,4,10,]).reshape(-1,1)y_train = np.array([1,1,1,1,1,3,3,3,3,3])x_cfl=XGBClassifier(n_estimators=1)x_cfl.fit(x_train,y_train)sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid")sig_clf.fit(x_train, y_train)
错误:
TypeError: predict_proba() got an unexpected keyword argument 'X'"
完整的跟踪信息:
TypeError Traceback (most recent call last)<ipython-input-48-08dd0b4ae8aa> in <module>----> 1 sig_clf.fit(x_train, y_train)~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in fit(self, X, y, sample_weight) 309 parallel = Parallel(n_jobs=self.n_jobs) 310 --> 311 self.calibrated_classifiers_ = parallel( 312 delayed(_fit_classifier_calibrator_pair)( 313 clone(base_estimator), X, y, train=train, test=test,~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable) 1039 # remaining jobs. 1040 self._iterating = False-> 1041 if self.dispatch_one_batch(iterator): 1042 self._iterating = self._original_iterator is not None 1043 ~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator) 857 return False 858 else:--> 859 self._dispatch(tasks) 860 return True 861 ~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch) 775 with self._lock: 776 job_idx = len(self._jobs)--> 777 job = self._backend.apply_async(batch, callback=cb) 778 # A job can complete so quickly than its callback is 779 # called before we get here, causing self._jobs to~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback) 206 def apply_async(self, func, callback=None): 207 """Schedule a func to be run"""--> 208 result = ImmediateResult(func) 209 if callback: 210 callback(result)~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch) 570 # Don't delay the application, to avoid keeping the input 571 # arguments in memory--> 572 self.results = batch() 573 574 def get(self):~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self) 260 # change the default number of processes to -1 261 with parallel_backend(self._backend, n_jobs=self._n_jobs):--> 262 return [func(*args, **kwargs) 263 for func, args, kwargs in self.items] 264 ~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0) 260 # change the default number of processes to -1 261 with parallel_backend(self._backend, n_jobs=self._n_jobs):--> 262 return [func(*args, **kwargs) 263 for func, args, kwargs in self.items] 264 ~/anaconda3/lib/python3.8/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs) 220 def __call__(self, *args, **kwargs): 221 with config_context(**self.config):--> 222 return self.function(*args, **kwargs)~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in _fit_classifier_calibrator_pair(estimator, X, y, train, test, supports_sw, method, classes, sample_weight) 443 n_classes = len(classes) 444 pred_method = _get_prediction_method(estimator)--> 445 predictions = _compute_predictions(pred_method, X[test], n_classes) 446 447 sw = None if sample_weight is None else sample_weight[test]~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in _compute_predictions(pred_method, X, n_classes) 499 (X.shape[0], 1). 500 """--> 501 predictions = pred_method(X=X) 502 if hasattr(pred_method, '__name__'): 503 method_name = pred_method.__name__TypeError: predict_proba() got an unexpected keyword argument 'X'
我对此感到非常惊讶,因为直到昨天它对我来说还是可以运行的,当我使用其他分类器时,同样的代码也可以运行。
from sklearn.calibration import CalibratedClassifierCVfrom xgboost import XGBClassifierimport numpy as npx_train = np.array([1,2,2,3,4,5,6,3,4,10,]).reshape(-1,1)y_train = np.array([1,1,1,1,1,3,3,3,3,3])x_cfl=LGBMClassifier(n_estimators=1)x_cfl.fit(x_train,y_train)sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid")sig_clf.fit(x_train, y_train)
输出:
CalibratedClassifierCV(base_estimator=LGBMClassifier(n_estimators=1))
我的Xgboost安装有问题吗?我使用conda进行安装,我记得昨天我卸载了xgboost并重新安装了它。
我的xgboost版本:
1.3.0
回答:
我认为问题出在XGBoost上。这里有解释:https://github.com/dmlc/xgboost/pull/6555
XGBoost定义了:
predict_proba(self, data, ...
而不是:
predict_proba(self, X, ...
由于sklearn 0.24调用clf.predict_proba(X=X)
,因此抛出了异常。
这里有一个解决问题的方法,而无需更改您的软件包版本:创建一个继承自XGBoostClassifier
的类,重写predict_proba
方法,使用正确的参数名称,并调用super()
。