Generating "Table 6" of the withdrawn SIMON-32/64 paper

Recently, a paper claiming a break of the SIMON-32/64 cryptosystem appeared at a pre-print archive, but was soon withdrawn. Many commenters have focused on the authorship (the author names are, at best, pseudonyms; and the choice of references to Judaism and Christianity leave an odd taste in the mouth), but what of the claims? Well, they are trivially dismissed too.

The paper claimed that the break was so "dangerous" that they would not reveal the method itself; Instead, they include a table ("Table 6") which they claimed could only be created if they had in fact broken SIMON-32/64 in a way that let them recover keys based on 2 chosen plaintexts with a 2.5% (?) chance of success. They claimed that they did roughly the following:

Choose 4, 4-byte blocks from a chosen text (supposedly, the Project Gutenberg version of the King James Bible, AKA pg10.txt)
Call the first 2 blocks the "plaintext" and the second two blocks the "cyphertext". If the two plaintext blocks are equal, go back to step 1
Find a SIMON-32/64 key that encrypts the plaintext to the cyphertext, using their secret method
If such a key is found, output it

.. they claim that over the course of 120 days on a few dozen ARM CPUs they performed 18 key recoveries, which are shown in Table 6.

Aside from one error in Table 6, where the hexadecimal value of a cyphertext is shown incorrectly, the table does check out. (though some of the 4-grams don't appear anywhere in pg10.txt)

So, is it proof of a serious break in SIMON? No. There's another way to generate these values:

Choose a 64-bit SIMON key arbitrarily
Choose a first block of "plaintext" P1. Check if E(P1) is also a block. If not, continue with a fresh P1 value.
Choose a second block of "plaintext" P2 from those not yet checked. Check if E(P2) is also a block. If not, continue with a fresh P2 value.
If you reach this step, you have created a "Table 6" entry, with two plaintexts, two ciphertexts, and a key. The plaintext and cyphertext all come from your chosen text.
Repeat from step 1 until you have enough entries that you have proven your point.

In fact, without any attempt at optimization (aside from trivial parallelization), an i7-4790k can find about a thousand examples in 8 seconds; about 10% of all keys yielded at least one set of matching blocks.

There are around 48,000 distinct 4-grams in "pg10.txt", so for any given key and 4-byte plaintext, there's about a 1-in-90,000 chance for it to encrypt to some other 4-gram. Since the probability is independent for each 4-gram, the odds of getting 1 are 1/2, and the odds of getting 2 are 1/4. This extremely rough calculation but not too far off the 1/10 actually obtained.

The attached program, which adapts an implementation of SIMON from github, can be built with g++-6 on Linux. It needs "pg10.txt" in the current directory. For parallelization, pass "-fopenmp". `trolled.txt` is one possible output of the program, and the few entries that I back-checked with an independent (Python) SIMON implementation also from github.

I just hope that, whatever the authors actually did to make "table 6", it didn't really take 120 days on two cluster computers.

Update: Several commenters believe the paper takes two, 8-byte blocks from the chosen text. If this is true, then even fewer of the blocks shown actually match "pg10.txt". For instance, I based my "4, 4-byte blocks" assumption on the appearance of "LORDhard" as a cyphertext. If this is the case, then my program would take about 48,000 times longer; since when you find two texts, the odds that they're "1 location" different out of 48,000 locations is about 1 in 48,000. However, since their Table 6 is full of 8-grams (and even 4-grams) that don't come from pg10.txt, I don't feel TOO bad that my program presents examples that aren't either.

Files currently attached to this page:

search.c	5.0kB
trolled.txt	80.7kB