July 22 2010
Earlier this week, I wrote an article for YDN covering some of the reasons why one might want to run a multi-core HTTP server in NodeJS and some strategies for intelligently allocating connections to different workers. While routing based on characteristics of the TCP connection is useful, the approach outlined in that post has a serious shortcoming - we cannot actually read any data off of the socket when making these decisions. Doing so before passing off the file descriptor would cause the worker process to miss critical request data, choking the HTTP parser.
The above limitation precludes interrogating properties of the HTTP
request itself (e.g. headers, query parameters, etc) to make routing
decisions. In practice, there are a wide variety of use-cases where this
is important: routing by cookie, vhost, path, query parameters, etc. In
addition to cache affinity, this can provide some rudimentary forms of
access control (e.g. by running each vhost in a process with a different
UID or
chroot(2)
jail) or even QoS (e.g. by running each vhost in a process with its
nice(2)
value controlled).
Naively we could use NodeJS as a reverse HTTP proxy (and a pretty good one, at that), but the overhead of proxying every byte of every request is kind of a drag. As it turns out, we can use file descriptor passing to efficiently hand off each TCP connection to the appropriate worker once we've read enough of the request to make a routing decision. Thus, once the routing process delegates a connection to a worker, that worker owns it completely and the routing process has nothing more to do with it. No juggling connections, no proxying traffic, nothing. The trick is to do this in such a way that allows the routing process to parse as much of the request as it needs to while ensuring that all socket data remains available to the worker.
Step by step, we can do the following. Note that this does not work with HTTP/1.1 keep-alive, which multiplexes multiple requests over a single connection.
net.Stream
connection around the
received FD and use it to emit a synthetic 'data' event to replay
data already read off of the socket by the routing processIt's important to note that this does not rely on any modifications to the HTTP stack in the worker - just plane vanilla NodeJS. In order to do this, we have to recover from the fact that parsing the HTTP request in the routing process is destructive - it's pulling bytes off of the socket that are not available to the worker once it takes over the TCP connection. To make sure that the worker doesn't miss a single byte seen on the socket since its inception, we send over all data seen thus far and replay it in the worker using the synthetic 'data' event.
First, router.js
:
var HTTPParser = process.binding('http_parser').HTTPParser;
var net = require('net');
var path = require('path');
var sys = require('sys');
var Worker = require('webworker/webworker').Worker;
var VHOSTS = ['foo.bar.com', 'baz.bizzle.com'];
var WORKERS = {};
VHOSTS.forEach(function(vh) {
WORKERS[vh] = new Worker(path.join(__dirname, 'worker.js'));
});
net.createServer(function(s) {
var hp = new HTTPParser('request');
hp.data = {
'headers' : {
},
'partial' : {
'field' : '',
'value' : ''
}
};
var seenData = '';
hp.onURL = function(buf, start, len) {
var str = buf.toString('ascii', start, start + len);
if (hp.data.url) {
hp.data.url += str;
} else {
hp.data.url = str;
}
};
hp.onHeaderField = function(buf, start, len) {
if (hp.data.partial.value) {
hp.data.headers[hp.data.partial.field] = hp.data.partial.value;
hp.data.partial = {
'field' : '',
'value' : ''
};
}
hp.data.partial.field += buf.toString(
'ascii', start, start + len
).toLowerCase();
};
hp.onHeaderValue = function(buf, start, len) {
hp.data.partial.value += buf.toString(
'ascii', start, start + len
).toLowerCase();
};
hp.onHeadersComplete = function(info) {
// Clean up partial state
if (hp.data.partial.field.length > 0 &&
hp.data.partial.value.length > 0) {
hp.data.headers[hp.data.partial.field] = hp.data.partial.value;
}
delete hp.data.partial;
hp.data.version = {
'major' : info.versionMajor,
'minor' : info.versionMinor
};
hp.data.method = info.method;
hp.data.upgrade = info.upgrade;
if ('host' in hp.data.headers &&
hp.data.headers.host in WORKERS) {
s.pause();
WORKERS[hp.data.headers.host].postMessage(
seenData, s.fd
);
} else {
s.write(
'HTTP/' + info.versionMajor + '.' + info.versionMinor + ' ' +
'400 Host not found\r\n'
);
s.write('\r\n');
s.end();
}
};
s.ondata = function(buf, start, end) {
seenData += buf.toString('ascii', start, end);
var ret = hp.execute(buf, start, end - start);
if (ret instanceof Error) {
s.destroy(ret);
return;
}
};
}).listen(8080);
... next, worker.js
:
var Buffer = require('buffer').Buffer;
var http = require('http');
var net = require('net');
var sys = require('sys');
var srv = http.createServer(function(req, resp) {
resp.writeHead(200, {'Content-Type' : 'text/plain'});
resp.write('Hello, vhost world!\n');
resp.end();
});
onmessage = function(msg) {
var s = new net.Stream(msg.fd);
s.type = srv.type;
s.server = srv;
s.resume();
srv.emit('connection', s);
s.emit('data', msg.data);
s.ondata(new Buffer(msg.data, 'ascii'), 0, msg.data.length);
};
Keep in mind that this code is a prototype only (please don't ship it - I've left out a lot of error handling for the sake of readability ;), but I thought it was interesting enough to share with a broader audience. This implementation takes advantage of the task management and message passing facilities of node-webworker. It should run out of the box on node-v0.1.100.
Anyway, the key to this is being able to replay the socket's data in the
worker. You'll notice in the code above that we're calling
net.Stream.pause()
once we've received all necessary data in the
routing process. This ensures that this process doesn't pull any more
data off of the socket. If the kernel's TCP stack receives more data for
this socket after we've paused the stream, it will sit in the TCP
receive buffer waiting for someone to read it. Once the worker process
ingests the passed file descriptor and inserts it into its event loop,
this newly-arrived data will be read. In a nutshell, we use the TCP
stack itself to buffer data for us. If we really wanted to be clever, we
might be able to
use recv(2)
with MSG_PEEK
to look at data arriving on the socket while leaving it
for the worker, but I'm not sure how this would play with the event
loop.
Finally, while I think this is an interesting technique, it's worth noting that a typical production NodeJS deployment would be behind an HTTP load balancer anyway, to front multiple physical hosts for availability if nothing else. Many load balancers can route requests based on a wide variety of characteristics like vhost, client IP, backend load, etc. However, if one doesn't want/need a dedicated load balancer, or needs very application-specific logic to make routing decisions, I think the the above could be a useful tool.