使用Catboost分类器转换分类列

我试图将CatBoost应用于我的一个分类特征列，但遇到了以下错误：

CatBoostError: Invalid type for cat_feature[non-default value idx=0,feature_idx=2]=68892500.0 : cat_features must be integer or string, real number values and NaN values should be converted to string.

我可以使用独热编码，但这里很多人说CatBoost在这方面处理得更好，且不太容易使模型过拟合。

我的数据包括三列，’Country’，’year’，’phone users’。目标是’Country’，而’year’和’phone users’是特征。

数据：

Country   year   phone usersIreland   1989   978France    1990   854Spain     1991   882Turkey    1992   457...       ...    ...

到目前为止我的代码如下：

X = df.loc[115:305]y = df.loc[80:, 0]cat_features = list(range(0, X_pool.shape[1]))Output: [0, 1, 2]X_train, X_val, y_train, y_val = train_test_split(X_pool, y_pool, test_size=0.2, random_state=0)cbc = CatBoostClassifier(iterations=5, learning_rate=0.1)cbc.fit(X_train, y_train, eval_set=(X_val, y_val), cat_features=cat_features, verbose=False)print("Model Evaluation Stage")

在拟合CatBoost模型之前，我需要运行LabelEncoder吗？我在这里遗漏了什么？

回答：

正如您问题中包含的错误信息所述，所有分类特征都需要是字符串类型。要将'phone users'（或任何其他数据框列）转换为字符串类型，您可以使用df['phone users'] = df['phone users'].astype(str)。

CatBoost会根据其取值的唯一值数量，内部使用独热编码或目标编码对每个分类特征进行编码。无需事先使用LabelEncoder或OneHotEncoder对分类特征进行编码，详情请参见CatBoost文档。

学技术

使用Catboost分类器转换分类列

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复