I’ve been fiddling with TCP/IP networking in Gawk and am having a hard time figuring out why it behaves well with some sites but not for others. I’ve even tried using HTTP Live Headers in Windows to try and debug what’s going on, but to no avail.
The sample Gawk code below (Version 3.1.5) will work fine for the site http://www.sobell.com but will hang on http://www.drudgreport.com.
BEGIN {
print "Dumping HTML of www.sobell.com"
server = "/inet/tcp/0/www.sobell.com/80"
print "GET http://www.sobell.com" |& server
while ((server |& getline) > 0)
print $0
close(server)
print "Dumping HTML of www.drudgereport.com"
server = "/inet/tcp/0/www.drudgereport.com/80"
print "GET http://www.drudgereport.com" |& server
while ((server |& getline) > 0)
print $0
close(server)
}
I appreciate any help! Thanks All.
Your code (and the gawk manual) uses the outdated HTTP/0.9 syntax. Apparently the second server no longer supports this. Important differences:
The following code works for me:
You can find all the gory details in RFC 1945 (1.0) and RFC 2616 (1.1).