我正在处理一个大约3200万行的数据集:
RangeIndex: 32084542 entries, 0 to 32084541df.head() time device kpi value0 2020-10-22 00:04:03+00:00 1-xxxx chassis.routing-engine.0.cpu-idle 1001 2020-10-22 00:04:06+00:00 2-yyyy chassis.routing-engine.0.cpu-idle 972 2020-10-22 00:04:07+00:00 3-zzzz chassis.routing-engine.0.cpu-idle 1003 2020-10-22 00:04:10+00:00 4-dddd chassis.routing-engine.0.cpu-idle 934 2020-10-22 00:04:10+00:00 5-rrrr chassis.routing-engine.0.cpu-idle 99
我的目标是创建一个名为“role”的新列,并根据正则表达式填充数据
我的方法如下
def router_role(row): if row["device"].startswith("1"): row["role"] = '1' if row["device"].startswith("2"): row["role"] = '2' if row["device"].startswith("3"): row["role"] = '3' if row["device"].startswith("4"): row["role"] = '4' return row
然后,
df = df.apply(router_role,axis=1)
然而,这花了很长时间…有没有其他可能的方法?
谢谢
回答:
Apply函数非常慢,而且效果通常不佳。可以尝试以下方法:
df['role'] = df['device'].str[0]