LJ Archive

EOF

The Near-Death of Blog Search

Doc Searls

Issue #214, February 2012

A once-hot category is now down to just Google.

Blogging was born in the late 1990s, and it got hot after the turn of the millennium. Several characteristics made blogs distinctive. First, they were personal. Second, they consisted of posts organized in reverse chronological order: the latest stuff on top, the older stuff scrolling toward the bottom. Third, each post had its own URL, called a permalink, which survived with its own archival path after it scrolled off the current page. Fourth, with the advent of RSS (Really Simple Syndication), blogs had a journalistic advantage: notifying the world that a post had just gone up. Thus, blogs were more current and live than ordinary Web sites, and they were designed to accumulate a corpus of work that respected the future by organizing and preserving the past.

By contrast, ordinary Web sites tended to be static and have no archival function. This was especially true of commercial sites. Last year's shoes were as gone as last year's snow.

Google and other search engines of the time paid little attention to blogs. They assumed that the Web was more like real estate than like anything that was alive and constantly changing. “Sites” that were “designed”, “constructed” or “built” at “locations” you could “visit” or “browse” mattered more than “journals” that were “posted”, “updated” and “syndicated”.

That left open a hole in the marketplace for search engines that indexed the live Web, leaving the static one to Google and its direct competitors.

The first blog search engine was PubSub, which showed up sometime in 2002. It was inventive and took some getting used to, but it was fast and did a good job of finding current postings in the blogosphere.

Second was Technorati, which came along in late 2002, out of research David Sifry and I were doing for the article “Building with Blogs”, which ran in March 2003 issue of Linux Journal (www.linuxjournal.com/article/6497). The first incarnation of Technorati was a MySQL hack that ran on a Debian box in the basement of David's San Francisco apartment. When he exposed it to the public on the Web, it was an instant success. Not long after that, David quit his day job and worked on building Technorati full-time.

Third was Bloglines, which came along in early 2003. Bloglines was a Web-based RSS news aggregator, rather than a search engine in the usual sense. Still, it served the need for readers to know what was being published right now on the Web.

During the next several years, other fish in the blog-search pond came to include Google Blog Search, BlogPulse, Yahoo, Feedster, IceRocket and Blogdigger.

After a while fake blogs—a form of spam Mark Cuban called “splogs”—came to comprise about 99% of the blogs in the world. This was a huge problem that even Google had trouble keeping up with. Thanks to the splog issue and other problems, Feedster and Blogdigger died off. Yahoo never was serious about blog search and quit the game. IceRocket and BlogPulse both were sold and now are mostly buzz search engines that don't remember anything more than a few months old. Blogdigger's page still is up but doesn't do anything. And Technorati, which once maintained a complete index of all syndicated sources, including all blogs from the beginning of the company's existence, turned into one of those “content” mills several years back.

The only true blog search engine still operating is Google Blog Search, which basically is a specialized search in Google's main engine, which has been modified in recent years to reduce time-to-index to minutes or even seconds. I hope it keeps going, because it's an essential resource for finding the kind of news that's syndicated live, still curates itself, and isn't just about pushing or riding whatever happens to be buzzing at the moment.

For a while after Technorati gave up, my favorite blog search engine was IceRocket. Now owned by Meltwater Buzz, it's about “social media monitoring” that “helps you mine conversations across social channels for nuggets of insight”. Note that the second-person “you” is not you and me, the users. We're the producers of ore from which insight nuggets are mined. The “you” they're talking to is advertisers.

This is what Twitter hath wrought, and Facebook as well. They've buried real news—stuff worth keeping around—under a mountain of buzz, all of which melts away after minutes, weeks or months.

But the durable stuff still matters. Journalism, compromised and corrupted as it has become, still matters—perhaps more than ever. That means blogs, and journals like this one, still matter too. But only to the degree that the work still can be found.

Doc Searls is Senior Editor of Linux Journal. He is also a fellow with the Berkman Center for Internet and Society at Harvard University and the Center for Information Technology and Society at UC Santa Barbara.

LJ Archive