数据框:
MovieID | movieCater |
---|---|
1 | Action, Comedy, Adventure |
2 | Action, Crime |
3 | Crime |
我想要的结果:
MovieID | movieCater | Action | Comedy | Adventure | Crime |
---|---|---|---|---|---|
1 | Action, Comedy, Adventure | 1 | 1 | 1 | 0 |
2 | Action, Crime | 1 | 0 | 0 | 1 |
3 | Crime | 0 | 0 | 0 | 1 |
我的数据框中不包含Action、Comedy等列。有没有办法实现这个功能?例如,movieCater的第一行包含Action、Comedy和Adventure。然后对应到相应的列名并将其设置为1。
回答:
尝试以下方法:
df_original = df.copy()df['movieCater'] = df['movieCater'].str.split(', ')df = df.explode('movieCater')df['value'] = 1df_original.join(df.pivot(columns=['movieCater'], values=['value']).fillna(0).droplevel(0,axis=1))# MovieID movieCater Action Adventure Comedy Crime# 0 1 Action, Comedy, Adventure 1.0 1.0 1.0 0.0# 1 2 Action, Crime 1.0 0.0 0.0 1.0# 2 3 Crime 0.0 0.0 0.0 1.0