使用StratifiedKFold进行CatboostRegressor的值错误

我刚开始学习Catboost，并尝试使用StratifiedKFold结合CatboostRegressor，但遇到了错误：

这里是经过编辑的帖子，包含完整的代码块和错误信息以供澄清。此外，我也尝试了使用for i, (train_index, test_index) in enumerate(fold.split(X,y))：但同样没有成功。

from sklearn.model_selection import KFold,StratifiedKFoldfrom sklearn.metrics import mean_squared_log_errorfrom sklearn.preprocessing import LabelEncoderfrom catboost import Pool, CatBoostRegressorfold=StratifiedKFold(n_splits=5,shuffle=True,random_state=42)err = []y_pred = []for train_index, test_index in fold.split(X,y):#for i, (train_index, test_index) in enumerate(fold.split(X,y)):    X_train, X_val = X.iloc[train_index], X.iloc[test_index]    y_train, y_val = y[train_index], y[test_index]    _train = Pool(X_train, label = y_train)    _valid = Pool(X_val, label = y_val)    cb = CatBoostRegressor(n_estimators = 20000,                      reg_lambda = 1.0,                     eval_metric = 'RMSE',                     random_seed = 42,                     learning_rate = 0.01,                     od_type = "Iter",                     early_stopping_rounds = 2000,                     depth = 7,                     cat_features = cate,                     bagging_temperature = 1.0)    cb.fit(_train,cat_features=cate,eval_set = _valid, early_stopping_rounds = 2000, use_best_model = True, verbose_eval = 100)     p = cb.predict(X_val)    print("err: ",rmsle(y_val,p))    err.append(rmsle(y_val,p))    pred = cb.predict(test_df)    y_pred.append(pred)predictions = np.mean(y_pred,0)

ValueError                                Traceback (most recent call last)<ipython-input-21-3a0df0c7b8d6> in <module>()      7 err = []      8 y_pred = []----> 9 for train_index, test_index in fold.split(X,y):     10 #for i, (train_index, test_index) in enumerate(fold.split(X,y)):     11     X_train, X_val = X.iloc[train_index], X.iloc[test_index]~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-    packages/sklearn/model_selection/_split.py in split(self, X, y, groups)    333                 .format(self.n_splits, n_samples))    334 --> 335         for train, test in super().split(X, y, groups):    336             yield train, test    337 ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-   packages/sklearn/model_selection/_split.py in split(self, X, y, groups)     87         X, y, groups = indexable(X, y, groups)     88         indices = np.arange(_num_samples(X))---> 89         for test_index in self._iter_test_masks(X, y, groups):     90             train_index = indices[np.logical_not(test_index)]     91             test_index = indices[test_index]~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _iter_test_masks(self, X, y, groups)    684     685     def _iter_test_masks(self, X, y=None, groups=None):--> 686         test_folds = self._make_test_folds(X, y)    687         for i in range(self.n_splits):    688             yield test_folds == i~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _make_test_folds(self, X, y)    639             raise ValueError(    640                 'Supported target types are: {}. Got {!r instead.'.format(--> 641                     allowed_target_types, type_of_target_y))    642     643         y = column_or_1d(y)ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

回答：

您遇到这个错误的原因非常基础，源自机器学习的基本理论：分层（stratification）仅定义用于分类（classification），以确保在分割中所有类别的代表性相等；在回归（regression）中是毫无意义的。仔细阅读错误消息，您应该能说服自己，它的意思是'continuous'目标（即回归）不被支持，仅支持'binary'或'multiclass'（即分类）；这不是scikit-learn的特殊情况，而确实是一个基本问题。

相关提示也包含在文档中（强调已添加）：

分层K折交叉验证器

提供用于将数据分割成训练/测试集的训练/测试索引。

这种交叉验证对象是KFold的一个变体，返回分层的折叠。折叠是通过保持每个类别的样本百分比来制作的。

这里是一个简短的演示，改编自文档中的示例，但将目标y从离散（分类）改为连续（回归）：

import numpy as npfrom sklearn.model_selection import StratifiedKFoldX = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])y = np.array([0.1, 0.5, -1.1, 1.2]) # 连续目标，即回归问题skf = StratifiedKFold(n_splits=2)for train_index, test_index in skf.split(X,y):    print("something")[...]ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

所以，简单来说，您实际上不能在您的（回归）设置中使用StratifiedKFold；将其改为简单的KFold然后继续进行…

学技术

使用StratifiedKFold进行CatboostRegressor的值错误

发表回复取消回复

相关文章：

Related Posts

在使用k近邻算法时，有没有办法获取被使用的“邻居”？

Theano在Google Colab上无法启用GPU支持

准确性评分似乎有误

Keras Functional API: “错误检查输入时：期望input_1具有4个维度，但得到形状为(X, Y)的数组”

如何使用sklearn.datasets.make_classification在指定范围内生成合成数据？

如何处理预测时不在训练集中的标签

发表回复 取消回复

发表回复取消回复