如何将分类数据放入箱中

我有以下分类数据：

['Self employed', 'Government Dependent', 'Formally employed Private', 'Informally employed', 'Formally employed Government', 'Farming and Fishing', 'Remittance Dependent', 'Other Income', 'Don't Know/Refuse to answer', 'No Income']

如何将它们放入箱中，使得：

 ['Government Dependent','Formally employed Government','Formally   employed Private'] = 0 ['Remittance Dependent', 'Informally employed','Self employed','Other Income'] = 1 ['Dont Know/Refuse to answer', 'No Income','Farming and Fishing'] = 2

我已经知道如何将数值数据放入分类箱中……反过来可以做吗？

TRAIN = pd.read_csv("Train_v2.csv")TRAIN['job_type'].unique()output:array(['Self employed', 'Government Dependent',       'Formally employed Private', 'Informally employed',       'Formally employed Government', 'Farming and Fishing',       'Remittance Dependent', 'Other Income',       'Dont Know/Refuse to answer', 'No Income'], dtype=object)

回答：

首先创建字典，通过交换进行修改，最后使用 Series.map：

a = ['Self employed', 'Government Dependent',       'Formally employed Private', 'Informally employed',       'Formally employed Government', 'Farming and Fishing',       'Remittance Dependent', 'Other Income',       'Dont Know/Refuse to answer', 'No Income']TRAIN = pd.DataFrame({'job_type':a})

#向字典中添加其他组d = {0: ['Government Dependent','Formally employed Government','Formally employed Private'],     1: ['Remittance Dependent', 'Informally employed'],     2: ["Don't Know/Refuse to answer", 'No Income']}#交换字典中的键值#http://stackoverflow.com/a/31674731/2901002d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}TRAIN['new'] = TRAIN['job_type'].map(d1)print (TRAIN)                       job_type  new0                 Self employed  NaN1          Government Dependent  0.02     Formally employed Private  0.03           Informally employed  1.04  Formally employed Government  0.05           Farming and Fishing  NaN6          Remittance Dependent  1.07                  Other Income  NaN8    Dont Know/Refuse to answer  NaN9                     No Income  2.0

如果输出中只有 0, 1 和 NaN，使用 numpy.select 也能工作，但如果有许多组，这会变得复杂且速度较慢：

m1 = TRAIN['job_type'].isin(['Government Dependent','Formally employed Government','Formally employed Private'])m2 = TRAIN['job_type'].isin(['Remittance Dependent', 'Informally employed'])m3 = TRAIN['job_type'].isin(["Don't Know/Refuse to answer", 'No Income'])TRAIN['new'] = np.select([m1, m2, m3], [0, 1, 2], np.nan)

学技术

如何将分类数据放入箱中

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复