在MLens Pipeline中使用StandardScaler作为预处理器会生成分类警告

我在MLens Superlearner管道的交叉验证折叠中尝试缩放我的数据。当我在管道中使用StandardScaler时(如下所示),我收到了以下警告:

/miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] 无法为pipeline-1.mlpclassifier评分。详细信息:ValueError(“分类指标无法处理二元和连续多输出目标的混合”)(name, inst_name, exc), MetricWarning)

值得注意的是,当我省略StandardScaler()时,警告会消失,但数据不会被缩放。

breast_cancer_data = load_breast_cancer()X = breast_cancer_data['data']y = breast_cancer_data['target']from sklearn.model_selection import train_test_splitX, X_val, y, y_val = train_test_split(X, y, test_size=.3, random_state=0)from sklearn.base import BaseEstimatorclass RFBasedFeatureSelector(BaseEstimator):      def __init__(self, n_estimators):        self.n_estimators = n_estimators        self.selector = None    def fit(self, X, y):        clf = RandomForestClassifier(n_estimators=self.n_estimators, random_state = RANDOM_STATE, class_weight = 'balanced')        clf = clf.fit(X, y)        self.selector = SelectFromModel(clf, prefit=True, threshold = 0.001)    def transform(self, X):        if self.selector is None:            raise AttributeError('The selector attribute has not been assigned. You cannot call transform before first calling fit or fit_transform.')        return self.selector.transform(X)    def fit_transform(self, X, y):        self.fit(X, y)        return self.transform(X)N_FOLDS = 5RF_ESTIMATORS = 1000N_ESTIMATORS = 1000RANDOM_STATE = 42from mlens.metrics import make_scorerfrom sklearn.metrics import roc_auc_score, balanced_accuracy_scoreaccuracy_scorer = make_scorer(balanced_accuracy_score, average='micro', greater_is_better=True)from mlens.ensemble.super_learner import SuperLearnerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.neural_network import MLPClassifierfrom sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifierfrom sklearn.preprocessing import StandardScalerfrom sklearn.feature_selection import SelectFromModelensemble = SuperLearner(folds=N_FOLDS, shuffle=True, random_state=RANDOM_STATE, n_jobs=10, scorer=balanced_accuracy_score, backend="multiprocessing")preprocessing1 = {'pipeline-1': [StandardScaler()]                 }preprocessing2 = {'pipeline-1': [RFBasedFeatureSelector(N_ESTIMATORS)]                 }estimators = {'pipeline-1': [RandomForestClassifier(RF_ESTIMATORS, random_state=RANDOM_STATE, class_weight='balanced'),                              MLPClassifier(hidden_layer_sizes=(10, 10, 10), activation='relu', solver='sgd',                                           max_iter=5000)                                         ]                 }ensemble.add(estimators, preprocessing2, preprocessing1)ensemble.add_meta(LogisticRegression(solver='liblinear', class_weight = 'balanced'))ensemble.fit(X,y)yhat = ensemble.predict(X_val)balanced_accuracy_score(y_val, yhat)```>Error text: /miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] 无法为pipeline-1.mlpclassifier评分。详细信息:ValueError("分类指标无法处理二元和连续多输出目标的混合")  (name, inst_name, exc), MetricWarning)

回答:

您目前在调用add方法时,将预处理步骤作为两个独立的参数传递。您可以将它们合并如下:

preprocessing = {'pipeline-1': [RFBasedFeatureSelector(N_ESTIMATORS),StandardScaler()]}

请参考此处找到的add方法的文档:https://mlens.readthedocs.io/en/0.1.x/source/mlens.ensemble.super_learner/

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注