I’m creating a reverse HTTP proxy using Node.js for fun. The code is pretty simple at the moment. It listens on 127.0.0.1:8080 for HTTP requests and forwards these to hostname.com, responses from hostname.com are then forwarded back to the client. Nothing fancy is done yet such as rewriting redirect headers, etc. The code is as follows:
var http = require('http');
var server = http.createServer(
function(request, response) {
var proxy = http.createClient(8080, 'hostname.com')
var proxyRequest = proxy.request(request.method, request.url, request.headers);
proxyRequest.on('response', function(proxyResponse) {
proxyResponse.on('data', function(chunk) {
response.write(chunk, 'binary');
});
proxyResponse.on('end', function() {
response.end();
});
response.writeHead(proxyResponse.statusCode, proxyResponse.headers);
});
request.on('data', function(chunk) {
proxyRequest.write(chunk, 'binary');
});
request.on('end', function() {
proxyRequest.end();
});
proxyRequest.on('close', function(err) {
if (err) {
console.log('close error: ' + err + ' for ' + request.url);
}
});
});
server.listen(8080);
server.on('clientError', function(exception) {
console.log('boo a clientError occured :(');
});
All appears to work well until I browse to a page that requires many additional resources (such as images) to be fetched. Naturally the browser will generate a number of GET requests to the reverse proxy to fetch these additional resources.
When I do browse to such a page some of the http.ServerRequests for the additional resources never receive responses. If I restart the page request it almost always results in success as all the resources that were successfully fetched on the first attempt were cached (hence the browser doesn’t try GET them again) and so now the browser only needs to grab a few missing ones.
At a guess I would imagine I’m hitting some kind of connection limit although I’m not sure. Any help would be greatly appreciated!
If you set up Wireshark on the proxy, you’ll almost certainly see what’s happening. (Note that you may need a second machine for this, because some TCP/IP stacks don’t provide anything that Wireshark can listen on for loopback traffic – see this)
I’m almost certain that the problem(s) you are running into here are all down to the
Connection:header – proxies MUST parse this header and handle it correctly. At a guess, I would say your code is handling the first request in aConnection: keep-alivestream and ignoring the rest. As a proxy, you are supposed to parse and remove/replace this header, and any associated headers (in this case theKeep-Alive:header), before forwarding the request to the server.If you want to build a HTTP/1.1 proxy, it’s very important that you read RFC 2616 and adhere to the many, many rules that it places on their behaviour. The particular problem you are running into here is documented in section 14.10.