Timestamp-based caching for filesystem-based dynamic websites

aether is nice, but it's a bit slow, especially when many local files must be parsed to produce a single page. cache.cgi is a simple program that, in cooperation with any filesytem-based dynamic website, can serve from a cached copy of the page when it is appropriate to do so.

This version of cache.cgi is very experimental—I'm not even using it on my own site yet.

The principle of cache.cgi is simple: For GET requests that do not include a query component, it checks for the existence of a cached copy of the page. The "cached copy" consists of two files: _cache is a copy of the page with headers, and _dep is a NUL-separated list of files or directories that the page depends on. A cached copy is valid if every file named in _dep exists and has a timestamp older than _dep.

When the cached copy is still valid, it is used. When it is not, the real CGI is invoked. The real CGI must write the _cache and _dep files. The real cgi should take care that these files are never incompletely written, or concurrent requests can get the wrong results.

The "real CGI" is set at the top of cache.cgi. It names a file on the local filesystem, not a URL. If the "real CGI" is /var/html/index.cgi then the cache is /var/html/index.cgi-cache, joined with PATH_INFO, joined with _cache or _dep.

A few wrinkles:

The exact format of _dep files will probably be made more complicated in the future. This is to cope with these anticipated problems:

In a testing setup, a moderately complicated aether page takes 600ms to serve, calculated by ab -n 10 http://www.example.com/index.cgi/. The same page takes 30ms to serve when it is cached, calculated by ab -n 10 http://www.example.com/cache.cgi/, a speedup of 20x.

Entry first conceived on 14 July 2005, 13:57 UTC, last modified on 15 January 2012, 3:46 UTC
Website Copyright © 2004-2024 Jeff Epler