如何将分类数据放入箱中

我有以下分类数据:

['Self employed', 'Government Dependent', 'Formally employed Private', 'Informally employed', 'Formally employed Government', 'Farming and Fishing', 'Remittance Dependent', 'Other Income', 'Don't Know/Refuse to answer', 'No Income']

如何将它们放入箱中,使得:

 ['Government Dependent','Formally employed Government','Formally   employed Private'] = 0 ['Remittance Dependent', 'Informally employed','Self employed','Other Income'] = 1 ['Dont Know/Refuse to answer', 'No Income','Farming and Fishing'] = 2

我已经知道如何将数值数据放入分类箱中……反过来可以做吗?

TRAIN = pd.read_csv("Train_v2.csv")TRAIN['job_type'].unique()output:array(['Self employed', 'Government Dependent',       'Formally employed Private', 'Informally employed',       'Formally employed Government', 'Farming and Fishing',       'Remittance Dependent', 'Other Income',       'Dont Know/Refuse to answer', 'No Income'], dtype=object)

回答:

首先创建字典,通过交换进行修改,最后使用 Series.map

a = ['Self employed', 'Government Dependent',       'Formally employed Private', 'Informally employed',       'Formally employed Government', 'Farming and Fishing',       'Remittance Dependent', 'Other Income',       'Dont Know/Refuse to answer', 'No Income']TRAIN = pd.DataFrame({'job_type':a})

#向字典中添加其他组d = {0: ['Government Dependent','Formally employed Government','Formally employed Private'],     1: ['Remittance Dependent', 'Informally employed'],     2: ["Don't Know/Refuse to answer", 'No Income']}#交换字典中的键值#http://stackoverflow.com/a/31674731/2901002d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}TRAIN['new'] = TRAIN['job_type'].map(d1)print (TRAIN)                       job_type  new0                 Self employed  NaN1          Government Dependent  0.02     Formally employed Private  0.03           Informally employed  1.04  Formally employed Government  0.05           Farming and Fishing  NaN6          Remittance Dependent  1.07                  Other Income  NaN8    Dont Know/Refuse to answer  NaN9                     No Income  2.0

如果输出中只有 0, 1NaN,使用 numpy.select 也能工作,但如果有许多组,这会变得复杂且速度较慢:

m1 = TRAIN['job_type'].isin(['Government Dependent','Formally employed Government','Formally employed Private'])m2 = TRAIN['job_type'].isin(['Remittance Dependent', 'Informally employed'])m3 = TRAIN['job_type'].isin(["Don't Know/Refuse to answer", 'No Income'])TRAIN['new'] = np.select([m1, m2, m3], [0, 1, 2], np.nan)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注