I am writing an application in C, using libpcap. My program listens for new packets and parses them
according to a grammar. The payload actually is XML.
Sometimes one packet is not enough for an XML file, so the XML buffer is splitted into separate packets.
I want to add code logic in order to handle these cases. However I don’t know in advance that a packet does not contain the whole data. How do I know that a packet has more data that will be send next? How to i recognize that a new packet contains the rest of the data?
Do I have to use the TH_FIN flag? Could you please explain it to me?
There’s nothing in TCP that defines packets, that’s up to the higher layers to define if they need to – TCP is just a stream.
If this is raw XML over a TCP stream, you actually need to parse the xml – you’ll know when you have a whole xml document when you’ve received the end of the document element.
If it’s XML packaged over HTTP , you might be able to parse out the Content-Length: header which should contain the length of the body.
Note, reassembling a TCP stream from captured packets is a very hard problem, there’s a lot of corner cases, e.g. you’d need to handle retransmission , out of sequence tcp segments and many more. http://libnids.sourceforge.net/ might help you.