tardiff: diff two (compressed) tar files without extracting

Recently I was googling for a script to compare tar files, and found references to a perl script (which I did not read) which reportedly did this by expanding both tar files and then diffing the trees. This would actually have been fine for my case, but some people noted that their use case involved tarfiles that were too big to extract comfortably. I assume that this is due to space considerations, but doubtless there are time considerations too.

Tarfiles are not exactly awesome for diffing: they are frequently compressed by stream compressors like gzip, have no defined order for files (i.e., they're not necessarily in lexical order) and they have no index, just headers at the start of each file. So unfortunately this means the most straightforward tardiff (i.e., this one) will read and uncompress the tarfile's contents multiple times: once to get the file lists, and then at least once more to read the file contents. At the scale where I'm working (diffing two distribution tarballs of software in the 1-10MB range) this doesn't matter so much, but it'll still perform poorly for the huge archives that you don't want to fully extract. Oh well.

Despite this limitation, it may be useful to someone (it was to me).

Files currently attached to this page:

tardiff.py2.1kB



Entry first conceived on 20 May 2013, 21:37 UTC, last modified on 21 May 2013, 14:36 UTC
Website Copyright © 2004-2014 Jeff Epler