The Objective
I’m trying to achieve the following:
- capture network traffic containing a conversation in the FIX protocol
- extract the individual FIX messages from the network traffic into a “nice” format, e.g. CSV
- do some data analysis on the exported “nice” format data
I have achieved this by:
- using pcap to capture the network traffic
- using tshark to print the relevant data as a CSV
- using Python (pandas) to analyse the data
The Problem
The problem is that some of the captured TCP packets contain more than one FIX message, which means that when I do the export to CSV using tshark I don’t get a FIX message per line. This makes consuming the CSV difficult.
This is the tshark commandline I’m using to extract the relevant FIX fields as CSV is:
tshark -r dump.pcap \
-R \'(fix.MsgType[0]=="G" or fix.MsgType[0]=="D" or fix.MsgType[0]=="8" or \ fix.MsgType[0]=="F") and fix.ClOrdID != "0"\' \
-Tfields -Eseparator=, -Eoccurrence=l -e frame.time_relative \
-e fix.MsgType -e fix.SenderCompID \
-e fix.SenderSubID -e fix.Symbol -e fix.Side \
-e fix.Price -e fix.OrderQty -e fix.ClOrdID \
-e fix.OrderID -e fix.OrdStatus'
Note that I’m currently using “-Eoccurrence=l” to get just the last occurrence of a named field in the case where there is more than one occurrence of a field in the packet. This is not an acceptable solution as information will get thrown away when there are multiple FIX messages in a packet.
This is what I expect to see per line in the exported CSV file (fields from one FIX message):
16.508949000,D,XXX,XXX,YTZ2,2,97480,34,646427,,
This is what I see when there is more than one FIX message (three is this case) in a TCP packet and the commandline flag “-Eoccurrence=a” is used:
16.515886000,F,F,G,XXX,XXX,XXX,XXX,XXX,XXX,XTZ2,2,97015,22,646429,646430,646431,323180,323175,301151,
The Question
Is there a way (not necessarily using tshark) to extract each individual, protocol specific message from a pcap file?
Better Solution
Using
tcpflowallows this to be done properly without leaving the commandline.My current approach is to use something like:
tcpflowensures that the TCP stream is followed, so no FIX messages are missed (in the case where a single TCP packet contains more than 1 FIX message).-Cwrites to the console and-Bensures binary output. This approach is not unlike following a TCP stream in Wireshark.The FIX delimiters are preserved which means that I can do some handy grepping on the output, e.g.
to extract all the execution reports. Note the
-Pargument to grep which allows the very powerful perl regex.A (Previous) Solution
I’m using Scapy (see also Scapy Documentation, The Very Unofficial Dummies Guide to Scapy) to read in a pcap file and extract each individual FIX message from the packets.
Below is the basis of the code I’m using:
I would still like to be able to get other information from the “frame” layer of the network packet, in particular the relative (or reference) time. Unfortunately, this doesn’t seem to be available from the Scapy packet object – it’s topmost layer is the Ether layer as shown below.