I have a pandas DataFrame containing timestamped events from multiple users. By default, the DataFrame is sorted by timestamp.
uid timestamp other_vars
1 100 ...
1 150 ...
2 150 ...
2 200 ...
1 225 ...
3 300 ...
3 400 ...
I’d like to get the diff of the timestamp within users. That is, for each event, I want to get the time elapsed since the previous event generated by the same user.
uid timestamp diff other_vars
1 100 NA ...
1 150 50 ...
2 150 NA ...
2 200 50 ...
1 225 75 ...
3 300 NA ...
3 400 100 ...
Is there a clean way to do this in pandas, ideally without sorting by User? Thanks!
As mentioned in the comments, you can use
groupby. I’dgroupbyand thendiff.groupbywill (unsurprisingly) group the rows:And then we select the column we’re interested in along these groups and
diffit:Note that we didn’t sort the timestamps, so if you wanted that you’ve have to do it explicitly.