I have a packet capture code that writes http payload into a file. Now i want to extract the URL information from these dumps.
For each packet , the payload begins like this.
GET /intl/en_com/images/logo_plain.png
HTTP/1.1..Host:
http://www.google.co.in..User-Agent:
Mozilla/5.0
I would like to extract :
- the string between “GET” and “HTTP/1.1”
- the string between “Host:” and “User-Agent”
How to do this in C ? Are there any inbuilt string functions ? Or Regular expressions ?
C doesn’t have built-in regular expressions, though libraries are available: http://www.arglist.com/regex/, http://www.pcre.org/ are the two I see most often.
For a task this simple, you can easily get away without using regexes though. Provided the lines are all less than some maximum length
MAXLEN, just process them one line at a time:This solution doesn’t require buffering the entire file in memory as KennyTM’s answer does (though that is fine by the way if you know the files are small). Notice that we use
fgets()instead of the unsafegets(), which is prone to overflow buffers on long lines.