[Solved] how to calculation cost time [closed]

Question

I think I understand what you’re asking. You just want to have a new dataframe that calculates the time difference between the three different entries for each unique order id?

So, I start by creating the dataframe:

data = [
    [11238,3943,201805030759165986,'新建订单',20180503075916,'2018/5/3 07:59:16','2018/5/3 07:59:16'],
    [11239,3943,201805030759165986,'新建订单',20180503082115,'2018/5/3 08:21:15','2018/5/3 08:21:15'],
    [11240,3943,201805030759165986,'新建订单',20180503083204,'2018/5/3 08:32:04','2018/5/3 08:32:04'],
    [11241,3941,201805030856445991,'新建订单',20180503085644,'2018/5/3 08:56:02','2018/5/3 08:56:44'],
    [11242,3941,201805022232081084,'初审成功',20180503085802,'2018/5/3 08:58:02','2018/5/3 08:58:02'],
    [11243,3941,201805022232081084,'审核成功',20180503085821,'2018/5/3 08:59:21','2018/5/3 08:58:21']
]

df = pd.DataFrame(data, columns=['id','order_id','order_no','order_status','handle_time','create_time','update_time'])
df.loc[:, 'create_time'] = pd.to_datetime(df.loc[:, 'create_time'])

Sort values by order_id and then create_time:

df = df.sort_values(by=['order_id', 'create_time'])

Next, I group by order id and select the 1st, 2nd, and 3rd entry:

first_df = df.groupby('order_id').nth(0)
second_df = df.groupby('order_id').nth(1)
third_df = df.groupby('order_id').nth(2)

Subtract the 1st from the second to get the 1st stage, and subtract the 2nd from the 3rd to get the second stage. Then combine them into an output dataframe:

stage_two = third_df.loc[:, 'create_time'] - second_df.loc[:, 'create_time']
stage_one = second_df.loc[:, 'create_time'] - first_df.loc[:, 'create_time']
stages = pd.concat([stage_one, stage_two], axis=1, keys=['stage_one', 'stage_two'])

print(stages)

And the output looks like:

     stage_one stage_two
order_id                    
3941      00:02:00  00:01:19
3943      00:21:59  00:10:49

Accepted Answer