我需要提取在 sklearn.ensemble.BaggingClassifier
中训练的每个模型的概率。进行此操作的原因是为了估计 XGBoostClassifier 模型的不确定性。
为此,我创建了一个从 sklearn.ensemble.BaggingClassifier
继承的扩展类,并添加了一个新方法来获取这些概率。请注意,这个问题与 ModuleNotFoundError: No module named ‘sklearn.utils._joblib’ 不同。
我展示了到目前为止我实现的代码片段,如下所示:
必要的模块
from sklearn.ensemble import BaggingClassifierfrom sklearn.ensemble.base import _partition_estimatorsfrom sklearn.utils import check_arrayfrom sklearn.utils.validation import check_is_fittedimport sklearn.utils as su
从 BaggingClassifier
继承的子类
class EBaggingClassifier(BaggingClassifier): """ Extends the class BaggingClassifier fromsklearn """ def __init__(self, base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0): super().__init__( base_estimator, n_estimators, max_samples, max_features, bootstrap, bootstrap_features, oob_score, warm_start, n_jobs, random_state, verbose)
下面定义了新的方法,用于计算每个估计器的概率。
def predict_proball(self, X): """ Computes the probability of each individual estimator Parameters ---------- X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator. Returns ------- p : array of shape = [n_samples, n_classes] The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. """ check_is_fitted(self, "classes_") # Check data X = check_array( X, accept_sparse=['csr', 'csc'], dtype=None, force_all_finite=False ) if self.n_features_ != X.shape[1]: raise ValueError("Number of features of the model must " "match the input. Model n_features is {0} and " "input n_features is {1}." "".format(self.n_features_, X.shape[1])) # Parallel loop n_jobs, n_estimators, starts = _partition_estimators(self.n_estimators, self.n_jobs) all_proba = su._joblib.Parallel(n_jobs=n_jobs, verbose=self.verbose, **self._parallel_args())( su._joblib.delayed(BaggingClassifier._parallel_predict_proba)( self.estimators_[starts[i]:starts[i + 1]], self.estimators_features_[starts[i]:starts[i + 1]], X, self.n_classes_) for i in range(n_jobs)) return all_proba
我使用 XGBoostClassifier
作为基础估计器来实例化这个类:
base_estimator = XGBoostClassifier(**params)estimator = EBaggingClassifier(base_estimator=base_estimator, max_samples=0.8, n_estimators=10)
然后使用 estimator.fit(X, y)
来拟合 estimator
,其中 X
和 y
是 pandas.DataFrame
对象。当我尝试运行 estimator.predict_proball(X)
时,我得到
>>> estimator.predict_proball(X)AttributeError: module 'sklearn.utils' has no attribute '_joblib'
有人知道为什么会这样吗?查看 BaggingClassifier
的 脚本,函数 ‘sklearn.utils._joblib’ 应该可用。
仅供参考:
>>> sklearn.__version__'0.19.2'
回答: