The Emergent Properties of Meat
Blog
Software
Hardware
About me
Photography
 Logos
Bigger projects: emc2 & AXIS
Friends' pages:
Alex
Bill
Chris
Jon
Michael
Email me: jepler@unpy.net

« Aethertool 0.6 Released | Newest entries in software | Canada dinner experience, links to photos »

aether is nice, but it's a bit slow, especially when many local files must be parsed to produce a single page. cache.cgi is a simple program that, in cooperation with any filesytem-based dynamic website, can serve from a cached copy of the page when it is appropriate to do so.

This version of cache.cgi is very experimental—I'm not even using it on my own site yet.

The principle of cache.cgi is simple: For GET requests that do not include a query component, it checks for the existence of a cached copy of the page. The "cached copy" consists of two files: _cache is a copy of the page with headers, and _dep is a NUL-separated list of files or directories that the page depends on. A cached copy is valid if every file named in _dep exists and has a timestamp older than _dep.

When the cached copy is still valid, it is used. When it is not, the real CGI is invoked. The real CGI must write the _cache and _dep files. The real cgi should take care that these files are never incompletely written, or concurrent requests can get the wrong results.

The "real CGI" is set at the top of cache.cgi. It names a file on the local filesystem, not a URL. If the "real CGI" is /var/html/index.cgi then the cache is /var/html/index.cgi-cache, joined with PATH_INFO, joined with _cache or _dep.

A few wrinkles:

  • URL components must not begin with an underscore
  • When PATH_INFO is empty (eg the URL is http://www.example.com/index.cgi) the name __index__ is used to locate the _cache and _dep files.
  • When the page depends on a file that does not exist, _dep must list the innermost containing directory that does exist.
  • HEAD requests are supported by chopping the contents of _cache after the first '\n\n' sequence
The exact format of _dep files will probably be made more complicated in the future. This is to cope with these anticipated problems:
  • A "max age" limitation. If you have a sidebar that is generated from an RSS feed fetched over HTTP, you probably fetch the page once a day and cache it yourself. The _dep file will list the local copy of the feed, which won't change until the page becomes outdated for some other reason. "max age" fixes this

  • A "time generated" marker. Right now, the timestamp of the _dep file is used. As with make, this creates a race condition between writing the _dep file and modifying a file listed in it. If the sequence of events is 1. file read
    2. file modified by another concurrent request
    3. _dep and _cache written
    then _dep is newer than file but _cache is based on an old version of file.

In a testing setup, a moderately complicated aether page takes 600ms to serve, calculated by ab -n 10 http://www.example.com/index.cgi/. The same page takes 30ms to serve when it is cached, calculated by ab -n 10 http://www.example.com/cache.cgi/, a speedup of 20x.

Traceback (most recent call last):
  File "/var/www/emergent/index.cgi", line 737, in markup
    text = getattr(this_module, 'markup_'+command)(text, meta, **thing_context)
  File "/var/www/emergent/index.cgi-data/_lib/local_code.py", line 61, in markup_dateinfo
    return '<br><br><font size=-2>Entry first conceived on %s, last modified on %s</font>' % (
ValueError: invalid literal for long(): 01121349427-cache_cgi



Powered by the Emergent Properties of Meat. Copyright © 2004-2008 Jeff Epler
[æ]