Wednesday, September 27, 2006

Apache hackery

The following is a patch against apache 2.0.54 (Probably applies clean to other versions, I've applied it to 2.0.55 also). It's built for debian linux, it's possible that some hacking may be necessary to get it to apply to a vanilla version of httpd but I doubt it. Copy the attached patch to a file called for example, 000_ProxyMultiSource.

Instructions for building on debian:
apt-get install debhelper apache2-threaded-dev
(also will need gcc, libtool, autoconf, etc, if not already installed)
cd /opt/apache/build
apt-get source apache2
cp 000_ProxyMultiSource debian/pacthes/.
debian/rules binary
Install the resulting .deb files: (We use the worker MPM, YMMV)
dpkg -i apache2_2.0.54-5_amd64.deb  apache2-common_2.0.54-5_amd64.deb  apache2-mpm-worker_2.0.54-5_amd64.deb
What it does:

This adds a new configuration directive to the apache config file. It is defined within the virtual host. The config item/syntax is:
ProxyMultiSource <ip> [IP] [IP]  [IP]
This causes the server, when acting as a proxy server, to randomly set it's source address to one of the <n> IP addresses above for each new request. This can be used, for example, to have a machine with a few DSL/T1 lines connected to it to split proxy traffic among all the links. It doesn't look all that random, especially at first, since all of the threads presumably have the same random seed so end up generating the same sequence of numbers. It hasn't been a big enough issue for me to fix it, since it evens out over time.

Note that these IP addresses actually have to be live on your system or the bind will fail, and probably with spectacular results. (I suspect it will lock up, since it repeatedly retries failures to bind() the local address -- this is to deal with "Address already in use" issues where the local and remote address/port pairs are identical across two transactions). Also see my previous post "Put it where it doesn't belong" to make sure that this IP traffic makes it out the appropriate interface instead of everything riding out the defaultroute. That is not what you want.

This also makes the variable "proxy-source" available to the logging system - for example:
LogFormat "%h %l %u %t \"%r\" %>s %b %T %{proxy-source}n" proxy
will include the IP address of the chosen proxy as the last value of the log entry. It will show as a "-" if it's not set -- if the request comes out of cache, or if it's a continuation of an HTTP/1.1 keepalive request, this may happen. (I may look into a way of preserving it for HTTP/1.1 requests in the future)

This code seems to be pretty stable; there have been a couple times where it's started up and given a signal 8(SIGWTF) but that's been rare. We see it gleefully take over 100 hits per second and push 100mbits+ traffic for extended periods.

Note: This has not been tested under the following circiumstances:
- Multiple virtual hosts sharing the same list of IP's
- Multiple virtual hosts with discrete lists of IP's
- A single ProxyMultiHost IP (probably useful in it's own right eh)
- Apache built with this patch _not_ implementing the ProxyMutliHost directive. (It may fail to bind an address at all)
- Virtual Hosts configured but not implementing the ProxyMultiHost directive.
If this stuff doesn't work, btw, it should not bomb the whole server, only the proxy functionality.

Note. I haven't written anything substantial in C in like 10 years, please be nice.

Get the patch here:
000_ProxyMultiSource

No comments: