Work the Shell

Listening to Your Twitter Stream

Dave Taylor

Issue #189, January 2010

Answer simple Twitter queries automatically.

Last month wrapped up with a problem so complex we had to delve into a different programming language to create a solution to the mathematics of calculating the distance between two lat/lon points on the globe. My head's still spinning. I long ago graduated computer science, so what the heck?

This month, I thought we should move back to something a bit more fun and perhaps a bit less complicated (well, maybe not, we'll see) and return to Twitter.

What I've been thinking about is how helpful it would be to have a bot that listened to my Twitter stream and answered simple queries directly without human intervention. Stores could have a bot respond to queries like “hours?” and “address?”, and students could have their schedule preprogrammed, and the bot could answer queries like “class?” by indicating what class students were in at that moment.

In fact, there's a local startup here in Boulder, Colorado, that is moving down this path called Local Bunny (localbunny.com), but it's doing a real, fully thought-out solution. By comparison, I'm going to show you a bubblegum and bailing wire approach!

Listening to Your Twitter Stream

To track a Twitter stream from an individual, it's quite easy: a call to the right URL with curl does the trick:

curl http://twitter.com/status/user_timeline/davetaylor.xml

That'll give you my last dozen tweets or so, along with a lot of additional information, all in XML format.

What we want, however, are mentions of an account or pattern, which require you to supply login credentials. This call is a bit more complicated, but you still can accomplish it with curl:

curl -u "davetaylor:$pw" http://www.twitter.com/statuses/mentions.xml

Here, I've set pw to my account password (you don't really want to know my password, do you?). The output, however, is something else. For an individual tweet, there are 42 lines of information that come back (for a 140-character tweet).

It's too much to show you here, but try the command yourself and be astonished at the output.

To trim it down, let's use grep with a regular expression to extract the Twitter ID of the person who sent the Tweet that mentions @DaveTaylor, and the tweet itself:


<text>@DaveTaylor  Have them send the money in gold bullion.</text>

  <screen_name>LenBailey</screen_name>

<text>@DaveTaylor Escrow.com</text>

  <screen_name>Ed</screen_name>

You can see here that the first tweet is from @LenBailey, and the second from @Ed.

Turning this into coherent output is a tiny bit tricky, because we really want to merge line pairs into a single line that denotes message and ID. That's a job for awk:

awk '{if (NR % 2 == 1) { printf ("%s",$0) } else { print $0 }}'

Now, if we feed the curl output to this, we'll see:


<text>@DaveTaylor  Have them send the money in gold bullion.</text>
<screen_name>LenBailey</screen_name>

<text>@DaveTaylor Escrow.com</text>  <screen_name>Ed</screen_name>

Next step: let's get rid of the XML artifacts and reformat it to be a bit easier to parse. We also can axe @DaveTaylor, because we know it's to this account already (in the actual code, it's one invocation, but here it's easier to show it in two lines for legibility):


sed 's/@DaveTaylor //;s/<text>//;s/<\/text>//' |
sed 's/   <screen_name>/ == /;s/<\/screen_name>//'

www.xetrade.com ?  == kiasuchick
 Have them send the money in gold bullion.  == LenBailey
Escrow.com == Ed

That's more like it!

Parsing Twitter Messages

Let's start by doing something simple. If you “@” my Twitter account with the command date, it'll detect it, actually run the date command, and send out the results on my behalf.

To do this, we'll want to split the data stream into “tweet” and “tweeter”, but we can do this in a tricky way by tweaking the earlier awk string to create name=value pairs:

awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) } 
 ↪else { print "id="$0 }}'

The result:

msg="escrow"; id=Stepan
msg="www.xetrade.com ?"; id=kiasuchick
msg=" Have them send the money in gold bullion.  "; id=LenBailey
msg="Escrow.com"; id=Ed

Nice. Now we can use the underutilized eval command in the growing script to set the variables msg and id to the two, and then check msg for known values. Now, if you're sharp, you'll realize tweets that include double quotes are a bit of a problem, but fortunately, the Twitter API is smart too. All single quotes pass through as is, but double quotes are rewritten as the HTML entity ".

Let's pause for a second so I can show you what I've built so far:

$curl -u "davetaylor:$pw" $inurl | \
  grep -E '(<screen_name>|<text>)' | \
  sed 's/@DaveTaylor //;s/  <text>//;s/<\/text>//' | \
  sed 's/    <screen_name>//;s/<\/screen_name>//' | \
  awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) } 
   ↪else { print "id="$0 }}' >
$temp

That grabs the 20 most-recent tweets for the specified user and converts them into msg="message" and id=userid for each one. Fed to eval in a loop, we now have a very easy way to parse things:


while read buffer
do
  eval $buffer
  echo Twitter user @$id sent message $msg
done < $temp

Let's wrap up the column here for now, but next month, we'll take the next step and actually parse the Twitter “@” messages being sent to me, trying to find those that match the predefined queries we've set, act upon them and respond.

This is going to be a pretty cool project when we're done!