I think I understand what you’re asking. You just want to have a new dataframe that calculates the time difference between the three different entries for each unique order id?
So, I start by creating the dataframe:
data = [
[11238,3943,201805030759165986,'新建订单',20180503075916,'2018/5/3 07:59:16','2018/5/3 07:59:16'],
[11239,3943,201805030759165986,'新建订单',20180503082115,'2018/5/3 08:21:15','2018/5/3 08:21:15'],
[11240,3943,201805030759165986,'新建订单',20180503083204,'2018/5/3 08:32:04','2018/5/3 08:32:04'],
[11241,3941,201805030856445991,'新建订单',20180503085644,'2018/5/3 08:56:02','2018/5/3 08:56:44'],
[11242,3941,201805022232081084,'初审成功',20180503085802,'2018/5/3 08:58:02','2018/5/3 08:58:02'],
[11243,3941,201805022232081084,'审核成功',20180503085821,'2018/5/3 08:59:21','2018/5/3 08:58:21']
]
df = pd.DataFrame(data, columns=['id','order_id','order_no','order_status','handle_time','create_time','update_time'])
df.loc[:, 'create_time'] = pd.to_datetime(df.loc[:, 'create_time'])
Sort values by order_id and then create_time:
df = df.sort_values(by=['order_id', 'create_time'])
Next, I group by order id and select the 1st, 2nd, and 3rd entry:
first_df = df.groupby('order_id').nth(0)
second_df = df.groupby('order_id').nth(1)
third_df = df.groupby('order_id').nth(2)
Subtract the 1st from the second to get the 1st stage, and subtract the 2nd from the 3rd to get the second stage. Then combine them into an output dataframe:
stage_two = third_df.loc[:, 'create_time'] - second_df.loc[:, 'create_time']
stage_one = second_df.loc[:, 'create_time'] - first_df.loc[:, 'create_time']
stages = pd.concat([stage_one, stage_two], axis=1, keys=['stage_one', 'stage_two'])
print(stages)
And the output looks like:
stage_one stage_two
order_id
3941 00:02:00 00:01:19
3943 00:21:59 00:10:49
solved how to calculation cost time [closed]