The paper claimed that the break was so "dangerous" that they would not reveal the method itself; Instead, they include a table ("Table 6") which they claimed could only be created if they had in fact broken SIMON-32/64 in a way that let them recover keys based on 2 chosen plaintexts with a 2.5% (?) chance of success. They claimed that they did roughly the following:
- Choose 4, 4-byte blocks from a chosen text (supposedly, the Project Gutenberg version of the King James Bible, AKA pg10.txt)
- Call the first 2 blocks the "plaintext" and the second two blocks the "cyphertext". If the two plaintext blocks are equal, go back to step 1
- Find a SIMON-32/64 key that encrypts the plaintext to the cyphertext, using their secret method
- If such a key is found, output it
Aside from one error in Table 6, where the hexadecimal value of a cyphertext is shown incorrectly, the table does check out. (though some of the 4-grams don't appear anywhere in pg10.txt)
So, is it proof of a serious break in SIMON? No. There's another way to generate these values:
- Choose a 64-bit SIMON key arbitrarily
- Choose a first block of "plaintext" P1. Check if E(P1) is also a block. If not, continue with a fresh P1 value.
- Choose a second block of "plaintext" P2 from those not yet checked. Check if E(P2) is also a block. If not, continue with a fresh P2 value.
- If you reach this step, you have created a "Table 6" entry, with two plaintexts, two ciphertexts, and a key. The plaintext and cyphertext all come from your chosen text.
- Repeat from step 1 until you have enough entries that you have proven your point.
In fact, without any attempt at optimization (aside from trivial parallelization), an i7-4790k can find about a thousand examples in 8 seconds; about 10% of all keys yielded at least one set of matching blocks.
There are around 48,000 distinct 4-grams in "pg10.txt", so for any given key and 4-byte plaintext, there's about a 1-in-90,000 chance for it to encrypt to some other 4-gram. Since the probability is independent for each 4-gram, the odds of getting 1 are 1/2, and the odds of getting 2 are 1/4. This extremely rough calculation but not too far off the 1/10 actually obtained.
The attached program, which adapts an implementation of SIMON from github, can be built with g++-6 on Linux. It needs "pg10.txt" in the current directory. For parallelization, pass "-fopenmp". `trolled.txt` is one possible output of the program, and the few entries that I back-checked with an independent (Python) SIMON implementation also from github.
I just hope that, whatever the authors actually did to make "table 6", it didn't really take 120 days on two cluster computers.
Update: Several commenters believe the paper takes two, 8-byte blocks from the chosen text. If this is true, then even fewer of the blocks shown actually match "pg10.txt". For instance, I based my "4, 4-byte blocks" assumption on the appearance of "LORDhard" as a cyphertext. If this is the case, then my program would take about 48,000 times longer; since when you find two texts, the odds that they're "1 location" different out of 48,000 locations is about 1 in 48,000. However, since their Table 6 is full of 8-grams (and even 4-grams) that don't come from pg10.txt, I don't feel TOO bad that my program presents examples that aren't either.
Files currently attached to this page:
Entry first conceived on 14 May 2019, 20:45 UTC, last modified on 15 May 2019, 2:41 UTC
Website Copyright © 2004-2018 Jeff Epler