hiff: hashing diff

This is an idea that floated around in my head long ago, and I took the time to write it. I doubt it's of practical use to anybody.

Imagine that you want to send instructions to somebody to apply changes to a piece of source code. Well, you'd use 'diff -u' (hopefully), and the recipient would use 'patch' to automatically perform the changes.

But what if for some reason you don't want to (or aren't allowed to) transmit the original lines? You could use a style which simply directs which old lines to remove by number, but you lose the context sensitivity of a unified diff, which is resilient when only a few lines are moved around, modified, or re-ordered. Or you could transmit the cryptographic hash of the old lines, allowing the recipient to be confident which line was indicated without actually sending that line. The hash is just a binary number, and clearly just transmitting a number is OK.

That is the purpose of 'hiff'. This version is implemented as a filter with two modes: hiff and dehiff. In hiff mode, any line that looks like an "original line" of a diff (roughly, a line in a diff section which starts "-" or " ") is converted to a 160-bit sha hash and that is written instead, with a "&" marker at the front. In dehiff mode, any line that has the hash marker has the reverse operation performed, using a hash-to-line map found from a number of source files. If the hash isn't found, the hash-marker "&" is swapped around to come after the "-" or " " instead of before it, so that hopefully diff will treat the un-found lines as fuzz and soldier on anyway.

Usage:

hiff [-h] <input >output
Convert input diff to hiff format
hiff -d -s source [-s source2 ...] <input >output
Convert input hiff to diff format using source (and source2 and so on) to find hash values

Files currently attached to this page:

hiff2.3kB



Entry first conceived on 8 May 2008, 3:12 UTC, last modified on 15 January 2012, 3:46 UTC
Website Copyright © 2004-2024 Jeff Epler