Tarfiles are not exactly awesome for diffing: they are frequently compressed by stream compressors like gzip, have no defined order for files (i.e., they're not necessarily in lexical order) and they have no index, just headers at the start of each file. So unfortunately this means the most straightforward tardiff (i.e., this one) will read and uncompress the tarfile's contents multiple times: once to get the file lists, and then at least once more to read the file contents. At the scale where I'm working (diffing two distribution tarballs of software in the 1-10MB range) this doesn't matter so much, but it'll still perform poorly for the huge archives that you don't want to fully extract. Oh well.
Despite this limitation, it may be useful to someone (it was to me).
Files currently attached to this page:
tardiff.py | 2.1kB |
Entry first conceived on 20 May 2013, 21:37 UTC, last modified on 21 May 2013, 14:36 UTC
Website Copyright © 2004-2024 Jeff Epler