我需要找到一个客户购买了P1后,经过多少天会购买P2。我无法找到同一个客户在购买P1和下一次购买P2之间的天数。
我有如下所示的数据。
Customer ID Order_Date ProductC-87 11/20/2018 P2C-87 7/25/2018 P1C-87 7/19/2019 P1C-87 8/2/2018 P2C-87 12/9/2019 P1... ... ... ... ... C-22 9/22/2018 P2C-22 9/4/2018 P2C-22 1/15/2018 P1C-22 9/5/2019 P2C-22 3/20/2018 P1 12
回答:
尝试以下方法:
df['Order_Date'] = pd.to_datetime(df['Order_Date'])df.sort_values(['ID', 'Order_Date', 'Product'], inplace = True)df['next_order_date'] = df['Order_Date'].shift(-1)df['next_product'] = df['Product'].shift(-1)df['days'] = df.apply(lambda row: row['next_order_date'] - row['Order_Date'] if ((row['next_product'] == 'P2') & (row['Product'] == 'P1')) else np.nan, axis = 1)
结果如下
ID Order_Date Product next_oder_date days7 C-22 2018-01-15 P1 2018-09-04 232 days6 C-22 2018-09-04 P2 2018-09-22 NaT5 C-22 2018-09-22 P2 2019-09-05 NaT8 C-22 2019-09-05 P2 2018-07-25 NaT1 C-87 2018-07-25 P1 2018-08-02 8 days3 C-87 2018-08-02 P2 2018-11-20 NaT0 C-87 2018-11-20 P2 2019-07-19 NaT2 C-87 2019-07-19 P1 2019-12-09 143 days4 C-87 2019-12-09 P1 NaT NaT