I have used both, and I conclude that I can read html data from webpages with tcpflow but cannot do so with tcpdump. The best I get is some ugly ASCII text with lots of period symbols.
My understanding is that tcpdump doesn’t reassemble packets, whereas tcpflow does. But if that was the key difference, wouldn’t the packet data from tcpdump still be human readable – just in smaller chunks? Is the problem that tcpdump is limited to ASCII and most network traffic is encoded in UTF-8?
I’m a rookie on network analysis/programming so forgive me if I’m missing something obvious.
To get that encrypted data one should use tcpdump with option
tcpdump –A(capital a). It transfers text without any headers and is used mainly for web pages. Hence we get response page easily.I think you are getting confused between an application layer and transport layer packet.
I do not know about tcpflow but tcpdump capture the whole packet (including header and all other stuff) not just the data.The html data which you are mentioning would be in the data part of a tcp/udp/icmp packet whichever you are using and so it needs you to understand the structure of tcp/udp/icmp packet as well …
I capture this packet on my machine and HTML data is clearly visible , you need to write script to get it from the output with a knowledge of packet structure.
The last 7-8 lines describe the html data.
use
-s0to capture whole frame and-Xto print in above ASCII human readable format.To get that encrypted data one should use TCPDUMP with option –A (capital a). It transfers text without any headers and is used mainly for web pages. Hence we get response page easily.
For eg:
I request index.html to 172.31.9.84 at port 80
Then I requested GET/index.html (an example page that contains only text “Indian institute of technology this is the test page”)
At this moment when I captured packets I got something as: