I don't stake any claim to being "the fastest" anagrammer, but my program is pretty fast. I credit its speed to two main factors: first, a data structure that makes it very fast to exclude a word from being an anagram solution; and second, a binary dictionary format that trims 95% of the loading time compared to a text dictionary.
The data structure consists of two parts. First, a 32-bit integer in which each bit is 1 if the corresponding letter appears in the word and 0 otherwise; and second an array of 26 8-bit integers which record the actual number of occurrences of the letter. This makes the quick-exclusion test for 'word B can be spelled from letters A' simply (A.mask & B.mask) == B.mask, a quick operation to perform. If this step doesn't show that it's impossible to spell B with the letters from A, then 26 more comparisons are done to see whether for each letter that is in B, A has enough copies of that letter to suffice.
As a result, it's possible to generate 1000 anagrams of a typical-length input in about 140 million CPU cycles, or around .07 seconds per input. Using 'server mode' and limiting the number of results improves things a bit, but the initial fitering of the (typically ~90k word) list dominates.
All in all, a fun project (which should be just about done now!)
Entry first conceived on 19 February 2013, 22:42 UTC, last modified on 23 February 2019, 17:28 UTC
Website Copyright © 2004-2024 Jeff Epler