Use the shell to generate movie trivia from a movie database.
It's been one of those proverbial journeys of a thousand steps, but I think we're finally ready to start generating some movie trivia after spending the past few months doing all the underlying tool development. You'll recall that we're grabbing the top 250 movies list from Amazon's IMDb site, then getting the release year of each movie and storing it in a database.
Separately, we chewed on the interesting problem of coming up with adjacent years for a given year in time, recognizing that the older the movie, the more of a spread we want between years, because precious few people will guess incorrectly that a movie released in 2007 was released in 1983, but a movie released in 1947 could stymie people who might think it came out in 1931.
Now, it's time to put the pieces together.
The last column dug in to the year spread, ending with a script that produced a likely adjacent year for a given year. We need to refine this script, because what we want to produce are three different year possibilities, two that are likely but wrong and one that's the correct year, without duplicates.
First, let's make the code that generates a reasonable adjacent year a script function:
get_random() { delta="$(( $RANDOM % $factor + 1))" add="$(( $RANDOM % 2 ))" if [ $add -eq 1 ] ; then closeyear="$(( $releasedate + $delta ))" else closeyear="$(( $releasedate - $delta ))" fi if [ $closeyear -gt $thisyear ] ; then closeyear="$(( $releasedate - $delta ))" fi }
Next, given that we can't gracefully return a value short of using a global variable, here's how we can leverage the function:
get_random match1=$closeyear
That gets us the first year guess, easily enough. But, the next guess needs to be different from the first. How to do that? In a while loop:
match2=$match1 # needs an initial value while [ $match2 -eq $match1 ] ; do get_random match2=$closeyear done
This is slightly risky, because there is the possibility of an infinite loop if the code never finds a random year value that differs, but I'll ignore that for now.
Now we have three year values: two incorrect ones, $match1 and $match2, and the correct value, $releasedate. How to give them back to the calling routine sorted? Easy:
echo "$match1 $match2 $releasedate" | sort -n
And, that's the function. Give it a year, and it'll return three: two that are close but wrong, and one that's correct. For example:
$ ./year-delta.sh 1975 1981 1971 1975 $ ./year-delta.sh 1999 2000 1998 1999 $ ./year-delta.sh 1938 1948 1935 1938
That's exactly what we want. Now, how to integrate this into the bigger script that grabs a random line from the IMDb database and then presents it in a workable fashion?
Once you remember the trick of $(( $RANDOM % some-value)), it should be straightforward to get a random line from a data file:
lines="$(wc -l < $filmdb | sed 's/ //g')" randline=$(( $RANDOM % $lines + 1 )) match="$(sed -n "${randline}p" < $filmdb)"
As I've written about before, wc is one of your best friends in script writing, because it's easy. But, it's also frustrating that there's no way to turn off the superfluous white space it generates. That's why the first line includes a call to sed to axe any spaces that are added. Somewhere, in a parallel universe to our own, there's an -n flag to wc that says “no padding” and makes this forevermore unnecessary. Sadly, we aren't in that universe, so just about every time you use wc, you have to strip out the white space at the same time.
The result of these three lines is that match has a value similar to:
The Lord of the Rings: The Two Towers|2002
Now we need to split it into two fields, which is easily, if tediously, done:
title="$(echo $match | cut -d\| -f1)" relyear="$(echo $match | cut -d\| -f2)"
And, finally, it's time to invoke the random years function that will, if you recall, generate one correct and two incorrect years:
years=$($randomyears $relyear)
Finally, let's pull the three years into separate variables and then output an attractive trivia query:
year1="$(echo $years | cut -d\ -f1)" year2="$(echo $years | cut -d\ -f2)" year3="$(echo $years | cut -d\ -f3)" echo "IMDb Top 250 Movie #$randline: Was $title released in $year1, $year2 or $year3?"
Not too shabby! Let's see how it works:
$ ./generate-trivia-question.sh IMDb Top 250 Movie #82: Was "Some Like It Hot" released in 1950, 1959 or 1963? $ ./generate-trivia-question.sh IMDb Top 250 Movie #118: Was "Mononoke-hime" released in 1994, 1995 or 1997? $ ./generate-trivia-question.sh IMDb Top 250 Movie #250: Was "Planet of the Apes" released in 1967, 1968 or 1969?
Perfect, perfect!
That's about all we have space for in this column, but we've come a long, long way from the URL for a Web page that lists some top movies to a nice little trivia engine that's fast and fun!
Next month, we'll look at how to inject the trivia into the Twitterstream. Want to see it in action? By the time you read this column, it'll be live at twitter.com/FilmBuzz (along with movie commentary and much more).