This version of cache.cgi is very experimental—I'm not even using it on my own site yet.
The principle of cache.cgi is simple: For GET requests that do not include a query component, it checks for the existence of a cached copy of the page. The "cached copy" consists of two files: _cache is a copy of the page with headers, and _dep is a NUL-separated list of files or directories that the page depends on. A cached copy is valid if every file named in _dep exists and has a timestamp older than _dep.
When the cached copy is still valid, it is used. When it is not, the real CGI is invoked. The real CGI must write the _cache and _dep files. The real cgi should take care that these files are never incompletely written, or concurrent requests can get the wrong results.
The "real CGI" is set at the top of cache.cgi. It names a file on the local filesystem, not a URL. If the "real CGI" is /var/html/index.cgi then the cache is /var/html/index.cgi-cache, joined with PATH_INFO, joined with _cache or _dep.
A few wrinkles:
- URL components must not begin with an underscore
- When PATH_INFO is empty (eg the URL is http://www.example.com/index.cgi) the name __index__ is used to locate the _cache and _dep files.
- When the page depends on a file that does not exist, _dep must list the innermost containing directory that does exist.
- HEAD requests are supported by chopping the contents of _cache after the first '\n\n' sequence
- A "max age" limitation. If you have a sidebar that is generated from an RSS feed fetched over HTTP, you probably fetch the page once a day and cache it yourself. The _dep file will list the local copy of the feed, which won't change until the page becomes outdated for some other reason. "max age" fixes this
- A "time generated" marker. Right now, the timestamp of the _dep
file is used. As with make, this creates a race condition between
writing the _dep file and modifying a file listed in it. If the sequence
of events is
1. file read
2. file modified by another concurrent request
3. _dep and _cache written then _dep is newer than file but _cache is based on an old version of file.
In a testing setup, a moderately complicated aether page takes 600ms to serve,
calculated by ab -n 10 http://www.example.com/index.cgi/. The same page
takes 30ms to serve when it is cached, calculated by ab -n 10
http://www.example.com/cache.cgi/, a speedup of 20x.
Entry first conceived on 14 July 2005, 13:57 UTC, last modified on 15 January 2012, 3:46 UTC
Website Copyright © 2004-2012 Jeff Epler