创建列并用另一列的数据填充（get_dummies，标签/独热编码）？

我的数据集看起来像这样：

ID_Patient    Exam_Name              Exam_Result385sdjhf76    Hemogram - Platelets   8487 u.p385sdjhf76    Urine - Color          dark yellow385sdjhf76    COVID-19 PCR           Detected...

COVID-19检查结果将是我的目标变量，其他检查将作为特征。我想做的包括为每个检查创建一个列，并用Exam_Result列中的值填充这些列。这种转换将减少数据集的行数，并使其看起来像这样：

ID_Patient    Hemogram - Platelets   Urine - Color    COVID-19 PCR385sdjhf76    8487 u.p               dark yellow      Detected  490dshfj76    374 u.p                Nan              Not detected387sshhf88    ...                    ...              ...

我已经使用get_dummies为每个检查创建了二进制列，但无法用Exam_Result中的值替换1。你有什么想法可以实现这种转换吗？

回答：

看起来你正在寻找pivot。

我将生成一些虚拟数据作为一个快速示例：

data = {'ID_Patient':[1] * 4 + [2] * 4,         'Exam_Name':[f'exam {i}' for i in range(4)] * 2,         'Exam_Result':[f'result {i}' for i in range(8)]}df = pd.DataFrame(data)

df现在看起来像这样：

   ID_Patient Exam_Name Exam_Result0           1    exam 0    result 01           1    exam 1    result 12           1    exam 2    result 23           1    exam 3    result 34           2    exam 0    result 45           2    exam 1    result 56           2    exam 2    result 67           2    exam 3    result 7

让我们进行透视操作：

df = df.pivot(index='ID_Patient', columns='Exam_Name', values='Exam_Result')

df现在看起来像这样：

Exam_Name     exam 0    exam 1    exam 2    exam 3ID_Patient                                        1           result 0  result 1  result 2  result 32           result 4  result 5  result 6  result 7

如果您不想要多重索引，可以这样做：

df = df.reset_index().rename_axis(None, axis=1)

现在df看起来像这样：

   ID_Patient    exam 0    exam 1    exam 2    exam 30           1  result 0  result 1  result 2  result 31           2  result 4  result 5  result 6  result 7

学技术

创建列并用另一列的数据填充（get_dummies，标签/独热编码）？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复