LJ Archive

Work the Shell

The find|xargs Sequence

Dave Taylor

Issue #249, January 2015

find|xargs: the magic of smart pipes versus filenames with spaces.

In my last article, I dug into the weird but powerful find command, a tool that I find to be an essential part of working with the command line on a Linux system, and as a key tool for shell scripts too. Although it's super powerful, find has some odd quirks and does a really poor job with filenames that have spaces.

Indeed, in the good-old days, UNIX was developed with a standard rule of “no spaces in filenames”, so it's only recently with the addition of far longer filename options that spaces have shown up to plague us Linux users. The problem, of course, is that the standard field separator in the shell is, you guessed it, the space. So if you have a file called “My Latest Story”, just about every command is going to hiccup.

Try this, and it'll fail:

cat My Latest Story

saying that file “My”, file “Latest” and file “Story” are not found.

Savvy command-line users have long since learned that filename completion is the easiest solution to this, typing in the fragment cat My then pressing <Tab> to have it completed:

cat My\ Latest\ Story

Aesthetically yechy, but it's functional. You also can quote filenames, of course, so this also would work:

cat "My Latest Story"

But, again, it's a hassle. The real solution simply is never to use spaces in Linux filenames, but as a shell script writer, you can't guarantee that your script users meet the same criteria, so you've got to cope. And, that's where find tends to fall down.

Mutual Incompatibility: find and Spaces

There's a rather kludgy solution that's now part of the complicated find language, fortunately, and it's just a simple variant on the basic -print predicate: -print0.

Run it by itself, however, and you'll get really odd output, because for every matching filename, find ends the filename with an ASCII 0 rather than the usual end of line. Try it, you'll see the output is a bit confusing!

To get this all to work with find, the most common solution is to pipe the output of find into the xargs command and specify the -0 flag:

find . -name "*.c" -print0 | xargs -0 ls -l

The above snippet would work for source files with names like “black box 2.c” and “chapter 3 problem 8.c”.

Let's start with just a simple find:

$ find . -name "*.c"
./black box 2.c
./chapter 3 problem 8.c
./helloworld.c
./sample.c

Add the -print0, and the output is a bit wonky, as expected:

$ find . -name "*.c" -print0
./black box 2.c./chapter 3 problem 8.c./helloworld.c./sample.c$ 

Messy. Worse, what if you use the find command and forget to compensate for those pesky space-filled filenames? Oh, it's not pretty:

$ find . -name "*.c" | xargs ls -l
ls: ./black: No such file or directory
ls: ./chapter: No such file or directory
ls: 2.c: No such file or directory
ls: 3: No such file or directory
ls: 8.c: No such file or directory
ls: box: No such file or directory
ls: problem: No such file or directory
-rw-r--r--  1 taylor  staff  0 Nov  5 14:39 ./helloworld.c
-rw-r--r--  1 taylor  staff  0 Nov  5 14:39 ./sample.c

I warned you up front that spaces in filenames cause trouble, and here's that trouble come to roost.

Add the -print0 instead of the assumed default of -print, pipe that directly to xargs, and now it all makes sense:

$ find . -name "*.c" -print0 | xargs -0 ls -l
-rw-r--r--  1 taylor  staff  0 Nov  5 14:39 ./black box 2.c
-rw-r--r--  1 taylor  staff  0 Nov  5 14:39 ./chapter 3 problem 8.c
-rw-r--r--  1 taylor  staff  0 Nov  5 14:39 ./helloworld.c
-rw-r--r--  1 taylor  staff  0 Nov  5 14:39 ./sample.c

I've written about dealing with spaces in filenames within shell scripts in the past. It's a pain. Now at least with find, you now know how to work in a space-friendly way.

A Bit More about xargs

Before moving on to the dynamic duo of find and xargs, however, let's spend a time bit more time on xargs itself. The xargs command is designed to let you invoke another command with arguments received in a pipe.

Commonly, you'll see find|xargs, but it turns out you can do other things with it too, as you'll see.

More important, remember that the first argument given to xargs itself is the command you want to run. A common usage might be something like this:

xargs grep -i "pattern"

as part of a pipeline.

Where xargs really shines though is with its many command-line arguments. One of the most useful of those is -n, which lets you specify the maximum number of entries it should accumulate before running the specified command. If you've ever seen a “stack overflow” or “buffer overflow” on the command line, you'll appreciate the -n flag. Here's a simple example:

$ echo this is a demo of the xargs -n flag | xargs -n3
this is a
demo of the
xargs -n flag

As you can see, the -n flag causes xargs to push out its buffer every n items—darn useful with really big directories!

Even more useful is the -p option that has xargs prompt you to proceed with the given command. Want to remove some files, but not others? Try this (carefully):

$ find . -print0 | xargs -0 -n1 -p rm -rf
rm -rf .?...n
rm -rf ./black box 2.c?...n
rm -rf ./chapter 3 problem 8.c?...y
rm -rf ./helloworld.c?...n
rm -rf ./sample.c?...n

In this sequence, xargs prompts with the ?... sequence (confusing though it is). Look carefully, and you'll see that “chapter 3 problem 8.c” is the only file I opted to delete. I also used -n1 to ensure that I could decide on a file-by-file basis which to delete.

Note that any of this works from within a shell script too, so if you had one that, say, rotated log files and deleted the oldest of them, using find|xargs would result in users being prompted, log file by log file, whether they want to delete the oldest or save them for historical research.

One of the coolest things you can do with find|xargs is to tie grep into it. Here's a way to search all your *.php files for preg_replace() invocations:

find / -name "*.c" -print0 | xargs -0 grep "preg_replace"

Most C programmers aren't going to be using filenames with spaces in them, so you might think the -print0 is unnecessary, but remember that parent directories might well have spaces anyway. So it's just smart to anticipate!

That's it for my tour of find and xargs. In my next article, I'll be back to shell script programming and will explore how to write an acey-deucey game. Yes, back to card games. See you then!

Dave Taylor has been hacking shell scripts for more than 30 years. Really. He's the author of the popular Wicked Cool Shell Scripts (and just completed a 10th anniversary revision to the book, coming very soon from O'Reilly and NoStarch Press). He can be found on Twitter as @DaveTaylor and more generally at his tech site: www.AskDaveTaylor.com.

LJ Archive