在sci-kit learn中使用分类预测变量

这里有一个基本问题:

我试图为信用卡违约实现一个简单的分类模型，仅使用model.fit和model.predict来处理我的输入数据。然而，这些输入数据既包含分类数据（如年龄、婚姻状况、教育水平等人口统计信息），也包含连续数据（如信用余额）。

data.info()

<div class="output"><div class="output_area"><div class="run_this_cell"></div><div class="prompt"></div><div class="output_subarea output_text output_stream output_stdout"><pre>&lt;class 'pandas.core.frame.DataFrame'&gt;Int64Index: 30000 entries, 1 to 30000Data columns (total 24 columns):LIMIT_BAL    30000 non-null float64SEX          30000 non-null int64EDUCATION    30000 non-null int64MARRIAGE     30000 non-null int64AGE          30000 non-null int64PAY_1        30000 non-null int64PAY_2        30000 non-null int64PAY_3        30000 non-null int64PAY_4        30000 non-null int64PAY_5        30000 non-null int64PAY_6        30000 non-null int64BILL_AMT1    30000 non-null float64BILL_AMT2    30000 non-null float64BILL_AMT3    30000 non-null float64BILL_AMT4    30000 non-null float64BILL_AMT5    30000 non-null float64BILL_AMT6    30000 non-null float64PAY_AMT1     30000 non-null float64PAY_AMT2     30000 non-null float64PAY_AMT3     30000 non-null float64PAY_AMT4     30000 non-null float64PAY_AMT5     30000 non-null float64PAY_AMT6     30000 non-null float64default      30000 non-null int64dtypes: float64(13), int64(11)memory usage: 5.7 MB</pre></div></div></div>

据我所知，scikit-learn要求所有数据必须是数值和连续的，或者明确编码为分类变量。数值部分不是问题，因为我的所有数据都是以数字形式编码的（例如，0表示已婚，1表示未婚），但我的三个变量（SEX、EDUCATION和MARRIAGE）是名义/顺序变量，需要编码为分类变量而不是int64类型。

如何使用scikit-learn的预处理模块对这三个变量进行编码，以便正确地将这些特征输入到像逻辑回归这样的模型中呢？

提前感谢，并请原谅格式（欢迎编辑或推荐如何正确地将Jupyter Notebook输出包含在Stack Overflow帖子中）。

回答：

在特征工程中，分类特征需要更多的关注，因为像年龄、日期等特征难以编码。有许多方法可以编码这些特征，通过分析、领域知识等方式进行编码。

有一个名为category_encoders的库，它有很多功能可以利用统计学对这些特征进行编码。更多信息可以在这里找到这里。

这里还有一个很好的资源，通过一个例子向您展示编码方法的使用。

学技术

在sci-kit learn中使用分类预测变量

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复