我有一个如下所示的数据框架:
我想创建一个名为“flag”的列,用于单独监测每个RB的“value”列的变化,并且仅当数值增加时,在变化前的那个月标记为1,但前提是变化前的那一个月的RB必须与变化发生的月份相同。因此,我认为简单的移位操作是无法实现的。
我还想创建一个类似的列,当某个RB的数值增加时,在变化前的那个月(如上列所示)、两个月前和三个月前都标记为1,规则保持不变,只有当变化发生的月份和这三个月的RB都相同的情况下,才对这三个月进行“移位”。
回答:
这应该可以实现您想要的效果:
import pandas as pddata = [ {"rb": 111, "date": "01/01/2020", "value": 5}, {"rb": 111, "date": "01/02/2020", "value": 5}, {"rb": 111, "date": "01/03/2020", "value": 4}, {"rb": 111, "date": "01/04/2020", "value": 6}, {"rb": 111, "date": "01/05/2020", "value": 6}, {"rb": 111, "date": "01/06/2020", "value": 6}, {"rb": 111, "date": "01/07/2020", "value": 6}, {"rb": 111, "date": "01/08/2020", "value": 7}, {"rb": 112, "date": "01/01/2020", "value": 3}, {"rb": 112, "date": "01/02/2020", "value": 3}, {"rb": 112, "date": "01/03/2020", "value": 4}, {"rb": 112, "date": "01/04/2020", "value": 4}, {"rb": 112, "date": "01/05/2020", "value": 5}, {"rb": 112, "date": "01/06/2020", "value": 5}, {"rb": 112, "date": "01/07/2020", "value": 5}, {"rb": 112, "date": "01/08/2020", "value": 5}, {"rb": 111, "date": "01/01/2020", "value": 18}, {"rb": 111, "date": "01/02/2020", "value": 18}, {"rb": 111, "date": "01/03/2020", "value": 17}, {"rb": 111, "date": "01/04/2020", "value": 11}, {"rb": 111, "date": "01/05/2020", "value": 13}, {"rb": 111, "date": "01/06/2020", "value": 13}, {"rb": 111, "date": "01/07/2020", "value": 13}, {"rb": 111, "date": "01/08/2020", "value": 13}, {"rb": 112, "date": "01/01/2020", "value": 14}, {"rb": 112, "date": "01/02/2020", "value": 14}, {"rb": 112, "date": "01/03/2020", "value": 17}, {"rb": 112, "date": "01/04/2020", "value": 17}, {"rb": 112, "date": "01/05/2020", "value": 5}, {"rb": 112, "date": "01/06/2020", "value": 5}, {"rb": 112, "date": "01/07/2020", "value": 5}]df = pd.DataFrame(data)df["flag"] = 0for index in range(len(df) - 1): df.loc[index, "flag"] = int(df.loc[index, "rb"] == df.loc[index + 1, "rb"] and df.loc[index, "value"] < df.loc[index + 1, "value"])df["flag_3m"] = 0for index in range(len(df)): try: df.loc[index, "flag_3m"] = int(df.loc[index, "flag_3m"] != 1 and ((df.loc[index, "value"] < df.loc[index + 1, "value"] and df.loc[index, "rb"] == df.loc[index + 1, "rb"]) or (df.loc[index + 1, "value"] < df.loc[index + 2, "value"] and df.loc[index, "rb"] == df.loc[index + 2, "rb"]) or (df.loc[index + 2, "value"] < df.loc[index + 3, "value"] and df.loc[index, "rb"] == df.loc[index + 3, "rb"]))) except: # 粗糙的方法 ;) passprint(df)
PS:也许先按rb进行groupby
然后检查数据会更容易,但这种方法也应该可以工作。