i build proxy server and it works great, however there are some sites which he cannot handle.
I tried to reduce the problem to its core and this is what i came up with:
My test case is: http://bits.wikimedia.org/en.wikipedia.org/load.php
which is one of the http messages transfered in each wikipedia page.
So i tried to build a request for it and send it via a socket like this:
String request1 =
"GET http://bits.wikimedia.org/en.wikipedia.org/load.php HTTP/1.1" +
"\r\n" +
"Host: bits.wikimedia.org" + "\r\n" +
"User-Agent: MyHttpProxy/example.java (http://stackoverflow.com/q/5924490/319266)" +
"\r\n" + "\r\n";
However i got 404 return code – which was strange because this page does exist!
I made alot of trys and made a new request which was different only in the request line:
String request2 =
"GET /en.wikipedia.org/load.php HTTP/1.1" +
"\r\n" +
"Host: bits.wikimedia.org" +
"\r\n" +
"User-Agent: MyHttpProxy/example.java (http://stackoverflow.com/q/5924490/319266)" +
"\r\n" + "\r\n";
and it worked! a good 200 was brought back with
some unimportent content(“/* No modules requested. Max made me put this here */”)
Can anyone tell me what is the problem here?
i looked at the rfc and i couldnt make any reason of this…
Here is the source code for running this test and print the resuls:
You would provide the full URL in the request line only if you’re going via a proxy server. Direct requests to a web server need to follow the form as in
request2in your example.Looking at the source, you send requests to port 80, which almost 100% means they’re not going through a proxy. My guess is that you need to send
request1to port 8080 or whatever port your proxy is listening on.As for the RFC, take a look at section 5.1.2. Note that the absolute path is used with proxies, and relative path with origin servers.