In this article, Dave Taylor shows complicated script code to complete the findwords script. Now you'll be ready to crush everyone in Scrabble and Words with Friends.
It was a dark and stormy night when I started this series here in Linux Journal—at least two months ago, and in Internet terms, that's quite a while. And just wait until our robot overlords are running the show, because then two months will be 10–20 generations of robot evolution and quite frankly, the T-2000 probably could have solved this problem already anyway.
Puny humans!
But, we haven't yet reached the singularity—at least, I don't think so. I asked Siri, and she said we hadn't, so that's good enough, right? Let's dive back in to this programming project because the end is nigh! Well, for this topic at least.
The challenge started out as trying to make words from a combination of letter blocks. You know, the wooden blocks that babies play with (or, alternatively, hurl at you if you're within 20 feet of them). Those give you six letters per space, but I simplified the problem down to the Scrabble tiles example: you have a set of letters on your rack; what words can you make with them?
I've talked about algorithms for the last few months, so this time, let's really dig in to the code for findwords, the resultant script. After discarding various solutions, the one I've implemented has two phases:
Identify a list of all words that are composed only of the letters started with (so “axe” wouldn't match the starting letters abcdefg).
For each word that matches, check that the number of letters needed to spell the word match up with the occurrences of letters in the starting pattern (so “frogger” can't be made from forger—but almost).
Let's have a look at the code blocks, because it turns out that this is non-trivial to implement, but we have learned to bend The Force to do our bidding (in other words, we used regular expressions).
First we step through the dictionary to identify n-letter words that don't contain letters excluded from the set, with the additional limitation that the word is between (length–3) and (length) letters long:
unique="$(echo $1 | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | uniq | \ fmt | tr -C -d '[[:alpha:]]')" while [ $minlength -lt $length ] do regex="^[$unique]{$minlength}$" if [ $verbose ] ; then echo "Raw word list of length $minlength for \ letterset $unique:" grep -E $regex "$dictionary" | tee -a $testwords else grep -E $regex "$dictionary" >> $testwords fi minlength="$(( $minlength + 1 ))" done
I explained how this block works in my column in the last issue (October 2015), if you want to flip back and read it, but really, the hard work involves the very first line, creating the variable $unique, which is a sorted, de-duped list of letters from the original pattern. Given “messages”, for example, $unique would be “aegms”.
Indeed, given “messages”, here are the words that are identified as possibilities by findwords:
Raw word list of length 6 for letterset aegms: assess mammas masses messes sesame Raw word list of length 7 for letterset aegms: amasses massage message Raw word list of length 8 for letterset aegms: assesses massages messages
Clearly there's more work to do, because it's not possible to make the word “massages” from the starting pattern “messages”, since there aren't enough occurrences of the letter “a”. That's the job of the second part of the code, so I'm just going to show you the whole thing, and then I'll explain specific sections:
pattern="$(echo $1 | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt ↪| sed 's/ //g')" for word in $( cat $testwords ) do simplified="$(echo $word | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt ↪| sed 's/ //g')" ## PART THREE: do all letters of the word appear # in the pattern once and exactly once? Easy way: # loop through and remove each letter as used, # then compare end states indx=1; failed=0 before=$pattern while [ $indx -lt ${#simplified} ] do ltr=${simplified:$indx:1} after=$(echo $before | sed "s/$ltr/-/") if [ $before = $after ] ; then failed=1 else before=$after fi indx=$(( $indx + 1 )) done if [ $failed -eq 0 ] ; then echo "SUCCESS: You can make the word $word" fi done
The first rather gnarly expression to create $pattern from the specified starting argument ($1) normalizes the pattern to all lowercase, sorts the letters alphabetically, then reassembles it. In this case, “messages” would become “aeegmsss”. Why? Because we can do that to each of the possible words too, and then the comparison test becomes easy.
The list of possible words was created in part one and is stored in the temporary file $testwords, so the “for” loop steps us through. For each word, $simplified becomes a similarly normalized pattern to check. For each letter in the proposed word, we replace that letter with a dash in the pattern, using two variables, $before and $after, to stage the change so we can ensure that something really did change for each letter. That's what's done here:
after=$(echo $before | sed "s/$ltr/-/")
If $before = $after, then the needed letter from the proposed word wasn't found in the pattern, and the word can't be assembled from the pattern. On the other hand, if there are extra letters in the pattern after we're done analyzing the word, that's fine. That's the situation where we can make, for example, “games” from “messages”, and that's perfectly valid, even with the leftover letters.
I've added some debugging statements so you can get a sense of what's going on in this example invocation:
$ sh findwords.sh messages Raw word list of length 5 for letterset aegms: amass asses eases games gamma gases geese mamma sages seams seems Raw word list of length 6 for letterset aegms: assess mammas masses messes sesame Raw word list of length 7 for letterset aegms: amasses massage message Raw word list of length 8 for letterset aegms: assesses massages messages created pattern aeegmsss SUCCESS: You can make the word asses SUCCESS: You can make the word eases SUCCESS: You can make the word games SUCCESS: You can make the word gases SUCCESS: You can make the word sages SUCCESS: You can make the word seams SUCCESS: You can make the word seems SUCCESS: You can make the word masses SUCCESS: You can make the word messes SUCCESS: You can make the word sesame SUCCESS: You can make the word message SUCCESS: You can make the word messages
So, we can make a dozen different words out of the word “messages”, including the word messages itself. What about the original pattern we were using in previous columns: “chicken”? For this one, let's skip the potential words and just look at the solution:
SUCCESS: You can make the word chic SUCCESS: You can make the word chin SUCCESS: You can make the word heck SUCCESS: You can make the word hick SUCCESS: You can make the word hike SUCCESS: You can make the word inch SUCCESS: You can make the word neck SUCCESS: You can make the word nice SUCCESS: You can make the word nick SUCCESS: You can make the word check SUCCESS: You can make the word chick SUCCESS: You can make the word chink SUCCESS: You can make the word niche SUCCESS: You can make the word chicken
Impressive!
To make this work a bit better, I've added some error checking, included an -f flag so we can have the script also output failures, not just successes, and left in some additional debugging output if $verbose is set to true.
See Listing 1 for the complete code. It's also available at www.linuxjournal.com/extra/findwords.
That's it. Now we have a nice tool that can help us figure out what to play the next time we're stuck on Scrabble, Words with Friends, or even looking at a big stack of letter blocks.
Next month, I'll turn my attention to a different scripting challenge. Do you have an idea? Send it to info@linuxjournal.com.