I have the data below saved as a pandas dataframe . With this data,

Question

0

Asked: June 6, 20262026-06-06T05:17:17+00:00 2026-06-06T05:17:17+00:00

I have the data below saved as a pandas dataframe . With this data,

0

I have the data below saved as a pandas dataframe. With this data, I would like to calculate the bid ask spread for a specific second. However, as you can see there are many times more asks than bids and vice versa. So my goal is to do the following: I would like to take only data such that it is a bid followed by an ask, or an ask followed by the bid for the same timestamp and then calculate the spread and how many spreads there were.

In the below data, it would look like the following, I would take row 1 and row 2, and calculate the spread which is 0. Then I would take row 3 and row 4 and have a spread of 2.

        time quote   price  volume
0   07:00:00     B  3950.5       5
1   07:00:00     B  3950.0       4
2   07:00:00     A  3950.0       7
3   07:00:00     B  3948.0      17
4   07:00:00     A  3950.0      20
5   07:00:00     A  3950.0      31
6   07:00:00     A  3950.0      44
7   07:00:00     A  3950.0      57
8   07:00:00     A  3950.0      67
9   07:00:00     A  3950.0      57
10  07:00:00     A  3950.0      67
11  07:00:00     A  3950.0      80
12  07:00:00     A  3950.0      90
13  07:00:00     A  3950.0      99
14  07:00:01     B  3948.0      15
15  07:00:01     A  3950.0      89
16  07:00:01     A  3949.5       1
17  07:00:02     A  3950.0      89
18  07:00:03     B  3948.0      12
19  07:00:03     A  3949.0       1
20  07:00:03     B  3948.0       9
21  07:00:03     B  3948.5       4
22  07:00:04     A  3949.5       5
23  07:00:04     B  3948.5       2
24  07:00:05     B  3948.5       1

This is my desired output:

       time spread num_spread
   07:00:00      2          2 
   07:00:01      2          1  
   07:00:03      1          1
   07:00:04      1          1

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T05:17:19+00:00

with open('/tmp/ba.data') as dataF:
    oldk, oldsub = None, None
    for key, subi in groupby(map(str.split,dataF), lambda x: (x[1],x[2])):
        if oldk == None:
            oldk, oldsub = key, list(subi)
        else:       
            newsub = list(subi)
            print ' '.join(oldk), '->', ' '.join(key), float(oldsub[-1][3])-float(newsub[0][3])
            oldk, oldsub = None, None

gets this

07:00:00 B -> 07:00:00 A 0.0
07:00:00 B -> 07:00:00 A -2.0
07:00:01 B -> 07:00:01 A -2.0
07:00:02 A -> 07:00:03 B 2.0
07:00:03 A -> 07:00:03 B 1.0
07:00:04 A -> 07:00:04 B 1.0

if you change

if oldk == None:

to

if oldk == None or oldk[0] != key[0]:

you`ll get

07:00:00 B -> 07:00:00 A 0.0
07:00:00 B -> 07:00:00 A -2.0
07:00:01 B -> 07:00:01 A -2.0
07:00:03 B -> 07:00:03 A -1.0
07:00:04 A -> 07:00:04 B 1.0

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the data below saved as a pandas dataframe . With this data,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply