I have the data below saved as a pandas dataframe. With this data, I would like to calculate the bid ask spread for a specific second. However, as you can see there are many times more asks than bids and vice versa. So my goal is to do the following: I would like to take only data such that it is a bid followed by an ask, or an ask followed by the bid for the same timestamp and then calculate the spread and how many spreads there were.
In the below data, it would look like the following, I would take row 1 and row 2, and calculate the spread which is 0. Then I would take row 3 and row 4 and have a spread of 2.
time quote price volume
0 07:00:00 B 3950.5 5
1 07:00:00 B 3950.0 4
2 07:00:00 A 3950.0 7
3 07:00:00 B 3948.0 17
4 07:00:00 A 3950.0 20
5 07:00:00 A 3950.0 31
6 07:00:00 A 3950.0 44
7 07:00:00 A 3950.0 57
8 07:00:00 A 3950.0 67
9 07:00:00 A 3950.0 57
10 07:00:00 A 3950.0 67
11 07:00:00 A 3950.0 80
12 07:00:00 A 3950.0 90
13 07:00:00 A 3950.0 99
14 07:00:01 B 3948.0 15
15 07:00:01 A 3950.0 89
16 07:00:01 A 3949.5 1
17 07:00:02 A 3950.0 89
18 07:00:03 B 3948.0 12
19 07:00:03 A 3949.0 1
20 07:00:03 B 3948.0 9
21 07:00:03 B 3948.5 4
22 07:00:04 A 3949.5 5
23 07:00:04 B 3948.5 2
24 07:00:05 B 3948.5 1
This is my desired output:
time spread num_spread
07:00:00 2 2
07:00:01 2 1
07:00:03 1 1
07:00:04 1 1
gets this
if you change
to
you`ll get