LJ Archive

At the Forge

Geolocation

Reuven M. Lerner

Issue #240, April 2014

Standardizing addresses, finding users and customizing content are just three ways geolocation can improve your Web applications.

There's an old saying in the real-estate business that the three most important things in a property are location, location and location. We can assume this is still true when it comes to real estate, but it also is increasingly true when it comes to Web applications. A number of my recent consulting projects have included, in one way or another, the need to work with addresses and locations of various sorts.

This shouldn't come as too much of a surprise, given the many ways that the Web is becoming the way we communicate, store information and work. It gives me a warm (if somewhat creepy) feeling when a site I go to wishes me a “good morning”, because it knows it is now morning in my part of the world. It's useful when a mapping program starts off by displaying my current location as a default. And as the person running various applications, I like the fact that I can learn basic geographical information about my users—both so I can offer additional services while simultaneously receiving useful data.

Working with street addresses, location coordinates and the like falls under the umbrella of “geolocation”. So in this article, I review a few of the technologies and options that use geolocation and offer some suggestions as to how you can include such features in your own Web applications.

Which Server?

The first thing to realize when it comes to geolocation is that you're almost certainly not going to be able to do it alone. Sure, given enormous amounts of time and money, you probably could figure out the locations and street addresses of most people in the world, but you're unlikely to do this. This means you're going to have to connect to one or more companies that owns and distributes mapping information via an API, such as Google, Bing (Microsoft) or similar.

There are free and open-source alternatives to commercial map providers, such as freegeoip.net and www.openstreetmap.org. However, the APIs of the commercial offerings are richer and seem to be better supported. Even some of the free services will require or expect that you have an API key, for which you need to register. This allows them to track how many requests you are making and to limit your usage unless you pay for a commercial tier. Although it is useful and nice to work with open-source tools, the remainder of this column assumes you are working with a commercial provider.

Note that some API libraries provide a single interface to multiple servers for both street addresses and IP addresses. For example, the Geocoder gem for Ruby (written and maintained by Alex Reisner) lets you choose from a number of different mapping providers and also from a number of IP address providers, defaulting to Google and freegeoip.net, respectively. You then can decide whether to use free or commercial services, or a mix of them, depending on your use case.

It's also important to remember that the accuracy is far from 100%. For example, I decided to look up an old address of mine from when I was living in Skokie, Illinois. I wrote a small Ruby program to do this:

require 'geocoder'
Geocoder.search('9120b niles center road skokie il')

Google, the default decoding system, almost immediately returned with a better-formatted version of the address, along with a great deal of other information. I was able to get the address out of the system:

Geocoder.search('9120b niles center road skokie 
 ↪il')[0].formatted_address
    => "9120C Niles Center Road, Skokie, IL 60076, USA"

Now, the fact is that the B and C units in that particular townhouse are right next to one another. And it's likely that if I were to look on a map, or even send mail to one of those addresses, the difference would be obvious. But as you can see, the address returned from Google is not necessarily the right one.

One of the nice things about Google's API is that it includes a large number of locations around the world. For example, I can look up my current address:

Geocoder.search('14 migdal oz street modiin israel')

But in this case, I don't get an address that matches mine, but rather the overall entry for my city of Modi'in. I actually don't even get back a single entry, but rather three, each of which represents Modi'in in a different way, with a slightly different spelling. The differences between the entries is most obvious if I ask for the coordinates from each of the three returned result objects:

Geocoder.search('14 migdal oz street modiin 
 ↪israel').map {|a| a.geometry['location'] }
=> [
    [0] { "lat" => 31.90912, "lng" => 35.002462 },
    [1] { "lat" => 31.890267, "lng" => 35.010397 },
    [2] { "lat" => 31.893661, "lng" => 34.96079 }
]

For many purposes, these coordinates are all close enough. However, if you are creating an application that depends on exact precision, such as a GIS navigation application, you're likely going to need to compare different services and perhaps even perform multiple queries, taking the result that most closely matches the location of interest.

Addresses and Coordinates

You now have seen several examples of how you easily can perform geocoding with the Geocoder Ruby gem. Given an address, you can invoke the “search” class method on the Geocoder object, getting an array of Geocoder result objects, containing various pieces of information about the resulting address. Even if there is only a single result, you will receive an array. And, the Google API tries hard to match something. It returned a result for “1 Main Street, Fredonia”, but returned an empty array when I entered “1 zzz street, yyy qqq”.

The result object contains a great deal of information. If I am interested in a standardized version of the address, I can invoke the “address_components” method on the result object, which will return an array of hashes containing the street number, street name, village name and so forth. This portion of the result contains more information than you need to address an envelope in the United States—for example, it includes the county and city name, as well as the state and postal code. You can grab these pieces of information separately or invoke methods that pull them together. I can use the “formatted_address” method (shown above) to get a complete address or the “street_address” method to get just the most important parts.

Several of the applications I've written for clients during the past few years have used geocoding APIs to standardize the addresses, ensuring that they have an “official” address that meets US specifications. This also helps avoid misspellings and other errors that can cause trouble down the road. Thus, even when a user enters his or her own address, we run it through a geocoding facility and store the result of this search. (It's probably a good idea to store the originally entered address as well.)

Instead of (or in addition to) an address, you'll often want to get the coordinates, including longitude and latitude. Because coordinates give an exact location on the globe, you can use them in a variety of places that aren't tied to individual addresses, like mapping software or GIS databases (such as PostGIS, a GIS extension to PostgreSQL). If I have the coordinates of a particular place, I then can draw that on a map with a great deal of accuracy. Two of my clients in recent years have asked that I hide users' addresses when displayed on maps for privacy reasons. Making up addresses (for example, changing “123 Main Street” to “456 Main Street”) is almost guaranteed to cause trouble and failure, but changing the coordinates by a small random factor has worked quite well.

Geocoding IP Addresses

Although some of my geocoding work has involved taking addresses from the user's input, much of it has been just the opposite—trying to figure out the user's location and then doing something with that information. In other words, I would like to take the user's IP address and use that to pinpoint the user's location.

The first thing to realize is that the need for such things has been reduced, to some degree at least, by the HTML5 API for geolocation. That API, implemented on the client and in JavaScript, allows an application to ask the browser to report its current location. (The standard requires that a browser ask the user before sending location information.) You then can use the information inside a Web page using JavaScript or invoke an Ajax call to send that information to the server, where it can be parsed and used.

On one recent project, I wasn't interested in bothering the user with geolocation information or in using that information within a Web application. Rather, I wanted to review an application's logs and summarize the countries from which visitors had come. To do this, I needed to review each logfile entry and then look for each IP address, determining its country.

Now, note that this kind of information can be grossly inaccurate. For example, I'm currently writing this in my local public library in Modi'in, Israel, and my IP address is being reported as 81.218.200.112 to the outside world. I can ask Geocoder to tell me where it thinks I am:

result = Geocoder.search('81.218.200.112')

And unfortunately, it has no idea, other than the fact that I'm in Israel:

result[0].country
  => "Israel"
result[0].city
  => ""

According to www.iplocation.net, which offers IP location to individual visitors, it thinks I'm in Petach Tikva—a fine city, but a 40-minute drive from where I'm sitting. That's because the geolocation is looking for the telecom facility, or the provider, rather than the specific location where I'm located.

So you always should take IP geolocation with a healthy dose of skepticism. Moreover, plenty of IP addresses are not in geolocation databases. Others are tied to companies or to services (for example, Google's searchbot) that will come to your site and make requests, but for which there is no location. And then there are the visitors who come to your site via cellular phones and services, which often can be national in scope and thus fail to provide an accurate reading.

That said, if you are interested in finding out general information about your users—their countries of origin and time zones—then IP location can work quite well. As you can see, the Geocoder gem lets you use the same class method, “search”, to request information about an IP address. It figures out whether you're entering an IP address, coordinates or a street address, and handles it accordingly. For one recent project, I was able to provide interesting information and analytics about the countries from which people came, simply by running the IP addresses through an IP geolocation library.

As a general rule, you never should perform such an action in real time, when the user is coming to your site. You are much better off running a background task or an hourly cron job. As you collect and store the IP location information, you almost certainly should store it in a database, or at least cache it, to avoid too many requests to the geolocation service.

If you do end up using Geocoder with Rails, you get a “location” method that you can invoke on the “request” object, allowing you to get the user's information automatically, via IP address. I have not tested this to see if calling the “location” method significantly adds to the response time, or if it is somehow handled in a separate thread, or by turning to a cached copy of the data on the local server, but it would be wise to check the performance hit before putting it into production.

Configuration

I have barely mentioned configuration so far, because I've found that Geocoder works so well out of the box. That said, there have been times when I have wanted or needed to reconfigure it. Fortunately, configuration is quite simple and straightforward, and is accomplished by invoking the Geocoder.configure class method.

For example, while working on this article at the library, I found that the Wi-Fi connection was quite slow—so much so that even simple API calls were timing out. I was pleasantly surprised to find that the Geocoder gem was smart enough to realize the problem was a timeout and suggest that I might want to avoid the timeout by invoking Geocoder.configure. Now, that's the sort of error message I would like to see more often! So, I invoked:

Geocoder.configure(:timeout => 1000)

And sure enough, my future calls worked just fine, even if they took a while to execute.

You always can get the current configuration settings by invoking Geocoder.location without any options. This returns a hash with all of the name-value pairs associated with the configuration system.

First, if you want to use a different geocoding API than the default of Google, you can do so by changing the “lookup” parameter in the configuration system:

Geocoder.configure(lookup: :nominatim)

Now, the results from a search will not be an array of Geocoder::Result::Google (which is what you received before), but rather Geocoder::Result::Nominatim. Each result object has a different set of methods and attributes, which means you cannot simply swap one API for another. The methods and data available reflect the information received from the geocoding API as best as possible.

Summary

Geolocation is far from perfectly accurate. However, this lack of precision doesn't mean that you should avoid using it in your applications. Whether you want to give localized greetings to your users, standardize addresses or create summaries and reports of who has been accessing your application, geocoding is a technique you likely will find useful for many of your applications. As easy as the commercial and free APIs may be to use, the existence of such open-source libraries like Geocoder makes it even easier.

Reuven M. Lerner, a longtime Web developer, consultant and trainer, is completing his PhD in learning sciences at Northwestern University. You can learn about his on-line programming courses, subscribe to his newsletter or contact him at lerner.co.il.

LJ Archive