As a web developer a lot of my development activities really comes down to handling a request, authenticating the request, validating the request, making an sql query, serializing the data for a response, and then handling the response in the web browser and render accordingly. Needless to say, I don’t spend a lot of time working on algorithms and optimization (outside of sql) these days.
And, recently I saw a small programming challenge on the web for determining anagrams in the english dictionary for a given word, and I sat down today and worked out a solution and here it is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def possible_anagrams(word): '''given a word, return a list of possible anagrams for that word''' if len(word) == 1: yield word for i, letter in enumerate(word): # recursively call with a word whose characters have been rotated for s in possible_anagrams(word[i+1:] + word[:i]): yield '%s%s' % (letter, s) def anagrams(word): '''given a word, return a list of anagrams in an english dictionary''' words = [w.rstrip() for w in open('WORD.LST')] return [a for a in possible_anagrams(word) if a in words] if __name__ == "__main__": print anagrams('python') |
Originally I wrote a recursive possible_anagrams function that returned a list, but then modified it to use a generator. There is room for optimization, but this was my first shot at getting a correct result.
Then, I was looking at the possible solutions and realized, as I typically do in these situations, I had made the problem more complex than it needed to be. Because, you don’t really need to find the possible anagrams. You can do something like:
1 2 3 4 | def anagrams(word): words = [w.rstrip() for w in open('WORD.LST')] sword = sorted(word) return [w for w in words if sorted(w) == sword] |
which compares a sorted list of characters. The solution is very clean and easy to understand. However, one solution really stood out to me for its use of itertools to find the possible anagrams:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import itertools
def anagrams(word):
words = [w.rstrip() for w in open('WORD.LST')]
#Reduce the number of words to compare to
# just the ones with the same lengths
# Also make a set to avoid duplicates
words = set([ w for w in words if len(w) == len(word)])
#Find all possible anagrams
comb = set([ ''.join(w) for w in
itertools.permutations(word, len(word))])
#Now find the ones that are words intersecting two sets
return comb.intersection(words) |
I was previously unaware of the itertools.permutations method. And, this solution is also easier to read than mine and performed 10-20 ms faster than mine as well. So, while I did make the problem more complex than needed I now have 3 ways to find an anagram should similar problems ever come up in real life and more importantly, to me, I solved something that didn’t involve sql.