LJ Archive

Cooking with Linux

Trekking through the Desktop Jungle

Marcel Gagné

Issue #138, October 2005

Is it easier to find a document on a far-away Web server than one on your own hard drive? Try some search programs to dig up the files you need.

That certainly does make it difficult, François. When I asked you to locate the wine order from last month and you told me it was somewhere on your disk, I didn't expect that it was sitting “somewhere on your disk” in quite this way. This is possibly the most disorganized home directory I've ever seen. Every document is in the same folder, and all the files are cryptically named. What were you thinking, mon ami?

Quoi? Well, of course there is a way to find it. If the document still exists somewhere on your disk, we'll find it. We just need to use the right tools. Later, though—our guests will be here any moment and...too late, François, they are already here! Welcome, everyone, to Chez Marcel, home of fine Linux fare and exquisite wines. Please sit and make yourselves comfortable. François will fetch your wine immédiatement. François, head to the east wing of the wine cellar and bring back that 2001 Nuits Saint George Pinot Noir we've been tasting, er, I mean, subjecting to quality control. Vite!

That wine, mes amis, just happens to represent part of an order lost in one of François' documents on his computer. Trouble is, he doesn't remember which document. What we need to do, is set him up with a desktop search engine. Luckily, this just happens to be the basis of tonight's menu, so we all will profit from my faithful waiter's lack of organization.

The original desktop search engine, mes amis, is something that's been around in Linux from the beginning, and that's the find command. This is an amazingly powerful tool and one that is easily overlooked in this age of cutting-edge graphical desktops. In its most basic form, find is used like this:

find starting_dir [options]

One of those options is -print, which makes sense only if you want to see any kind of output from this command. You easily could get a listing of every file on the system by starting at the top and recursively listing the disk:

find / -print

Of course, it makes more sense to search for something, for instance, all the MP3-type music files sitting on your disk. Because you know that the files end in a .mp3 extension, you can use that to search:

find / -name "*.mp3" -print

This is also great for locating big files you haven't looked at in forever. Maybe it's time to do a little archiving of those old files, but how do you find only those? Say you want to look for anything that has not been modified (this is the -mtime parameter) or accessed (the -atime parameter) in the past 12 months. The -o option is the “or” in this equation:


find /home/marcel -size +1024 \( -mtime +365 -o -atime +365 \) -ls

In case you are curious, the back-slashes in front of the parentheses are escape characters; they are there to make sure the shell does not interpret them in ways you do not want it to—in this case, the open and close parentheses on the second line. The preceding command also searches for files that are greater than 500KB in size. That is what the -size +1024 means, because 1024 refers to 512-byte blocks. The -ls at the end of the command tells the system to do a long listing of any files it finds that fit the search criteria. So far so good?

The find command is fairly simple to use on the surface, but it also has many command-line options and (as you can see) interesting ways of passing the results of a search to other commands, so that the results can be narrowed down or fine-tuned. Getting to know find is a great idea, but there are alternatives that are a little friendlier.

Many people out there have grown up in the graphical world of KDE or GNOME, so desktop tools have been created in each of these environments. Even so, my experience indicates that these excellent tools are, for many users, as equally overlooked as find. Let's have a look at those now.

Let's begin our search for search tools under KDE. Click the application launcher and look for a submenu labeled Find. The Find menu has two options, one for files and one for Web search (which, by default, launches Konqueror on the Google Web site). You also can fire up the files search tool by using the Alt-F2 quick launch (program name: kfind). When the application starts, the Find Files/Folders dialog appears. It contains three different tabs, and each is designed to help you locate the information you need. They are labeled Name/Location, Contents and Properties.

Under the Name/Location tab, specify the starting folder, either by entering it manually or by clicking the Browser button and navigating over to it using the KDE file navigator. There's also a field labeled Named where you enter part of a filename using Linux metacharacters. For instance, if I wanted to find all the files with Cooking anywhere in the title, I would enter *cooking*. By default, this is a case-insensitive search, so upper- and lowercase don't matter in terms of the search results. You can, however, override this behavior by clicking the Case-sensitive search check box.

Under the Contents tab, the real action takes place. Generally speaking, I don't have a problem locating a file by name. It's the content that is the real issue. Which of your several hundred documents contains a reference to a particular word or phrase is a more difficult search than which has a particular word in the name. The Contents tab lets you enter your search text (again, case-insensitive by default), regular expression searches and so on. You even can specify that Kfind search through binary files and not only documents (Figure 1). There's also a meta-info search feature for things like MP3 files that contain embedded information, such as title and artist.

Figure 1. KFind makes it easy for Marcel to locate all those columns that mention “wine”.

Finally, the Properties tab provides a means of searching for files or folders based on creation or modification date, ownership and more.

Similarly, GNOME users have access to the GNOME search tool (program name: gnome-search-tool), a similar program that lets you search based on filename, file content (text search) and date. Choose Search for files in the GNOME Places menu (I'm running 2.10 in this example), and this brings up the file find dialog (Figure 2).

Figure 2. The GNOME search tool allows you to search by name as well as text within a file.

When the dialog first appears, there isn't much to see. The defaults are to search for a file by name, which you enter in the Name contains field. Below that is your starting folder for the search, the default being your home directory. To get the full power of the GNOME search tool, click on the arrow next to the label that says Show more options. A new field appears through which you can specify some text in the file itself.

Finally, directly below the text search field, is one other option that can be quite complex. A drop-down box labeled Available options includes size, date and ownership search criteria that can be applied to narrow down your search results even further.

If you've been following search technology in any way, you'll know that there's a lot of excitement concerning desktop search engines these days—think Google for your desktop. In fact, Google does provide such a tool, but alas, only for non-Linux operating systems. However, this is not to say that desktop search tools don't exist for Linux.

One such tool is Roberto Cappuccio's Kat, a desktop search engine and indexing tool that makes it easy and fast to do full-text searches in a variety of document formats (for example, PDF, OpenOffice.org, KWord and so on). You also can search for images using thumbnails and more.

The Kat Web site (see the on-line Resources) provides binary packages for a number of distributions, so you may not need to build from source. Should you need to, however, the process is nothing more than the classic extract-and-build five-step. In terms of prerequisites, you need the SQLite database and its development libraries.

To use Kat, simply start the program (name: kat) and a plain three-pane window appears where you will do your work and your searching. The first step is to create a catalog. To do this, click File on the menu bar and select New.

When creating a new catalog, a four-tabbed window appears. The first tab, labeled Catalog, is where you enter the starting directory, the name of the catalog and other identifying information. On the second tab, labeled Metadata, you'll see a list of the various metadata engines that are available to Kat for indexing. You can remove different formats, but most likely, this will stay as is (Figure 3). Similarly, the Fulltext tab. Under Thumbnails, you can select the size of the thunmbnails created during the index process.

Figure 3. Using kfile hooks, Kat can index almost anything.

A status window keeps you abreast of the number of files and folders scanned, as well as the size of the collection (Figure 4).

Figure 4. As Kat creates the new catalog, the program reports statistics on the process.

This brings us to the one big drawback of a tool like this. If the folder for which you are creating a catalog is large, this can take an amazing amount of time. Be prepared or keep your catalogs confined to a reasonable collection of files. I tried to index my own home directory in its entirety at nearly 6.6 GB of data—suffice it to say, that was a mistake.

Once a catalog has been created, finding information is blazingly fast. Simply click on the search icon on the far right (the magnifying glass), enter your search term and Kat returns the results of the search almost instantly (Figure 5).

Figure 5. Although the initial indexing can take some time, Kat searches are blazingly fast.

According to the clock on the wall, it would appear, mes amis, that closing time has arrived. Before we leave this topic of desktop search engines, I'd like to mention another package with the friendly, puppy-dog name of Beagle. Beagle is built on Mono (the open-source .Net implementation) and requires an inotify-enabled kernel. Neither is uncommon in the more modern distributions. Beagle also shows promise in that it is very fast and works silently in the background, keeping an eye on what you tell it while automatically updating its catalog of information. Unfortunately, Beagle is very much alpha code and not quite ready for prime time, as they say (although it is included with the new SUSE Linux Professional 9.3). Nevertheless, Beagle is a tool to watch, and I've included the link in the on-line Resources.

Please raise your glasses, mes amis, and let us all drink to one another's health. A votre santé Bon appétit!

Resources for this article: /article/8456.

Marcel Gagné is an award-winning writer living in Mississauga, Ontario. He is the author of Moving to the Linux Business Desktop (ISBN 0-131-42192-1), his third book from Addison-Wesley. He also makes regular television appearances as Call for Help's Linux guy. Marcel also is a pilot and a past Top-40 disc jockey. He writes science fiction and fantasy and folds a mean Origami T-Rex. He can be reached via e-mail at mggagne@salmar.com. You can discover a lot of other things (including great Wine links) from his Web site at www.marcelgagne.com.

LJ Archive