Work the Shell

Scripting Common File Rename Operations

Dave Taylor

Issue #199, November 2010

If you find yourself always typing the same set of commands, it's time to write a script. This month, it's a script to rename and renumber files.

I'm guessing that each of us uses the command line differently and seeks to accomplish different tasks. Mine are sometimes very specialized, like the script I wrote that lets me easily transform the unique filenames from the Mac OS X built-in screen-capture utility into a Web-friendly format.

In the past few weeks, I realized I needed another fairly specialized script for file renaming, but this time, I wanted to write something as generally useful as possible.

There's already a utility included in some flavors of Linux called rename, but, alas, I couldn't find it on my Linux/NetBSD systems. If you have it, it probably duplicates the functionality I create this month. Still, read on. Hopefully, this'll be useful and interesting!

Rename/Pattern/Newpattern

It's surprising how often I find myself on the command line typing in something like:

for name in xx*
do
    new="$(echo $name | sed 's/xx/yy/')"
    mv $name $new
done

So, that's the first part of the script I want to create, one that lets me just specify the OLD and NEW filename patterns, then simply renames all files matching “OLD” with the “NEW” pattern substituted.

For example, say I have test-file-1.txt and test-file-2.jpg and want to replace “test-file” with “demo”. The goal is to have an invocation like:

rename test-file demo

and have it do all the work for me. Sound good?

How Many Matching Files?

The first step is actually the most difficult: matching an arbitrary pattern and catching any possible error conditions gracefully. The loop is going to end up looking like this:

for name in $1*

If there aren't any matches, however, you get an ugly error message and the script looks amateurish. So, the goal is actually to ascertain before the for loop how many matches there are to that given pattern.

Ah, okay, so ls $1* | wc -l does the trick, right? Nope, that'll still generate the same ugly error message.

Fortunately, there's a way in Bash that you can redirect stderr to go to stdout (that is, to have your error messages appear as standard messages, able to be rerouted, piped and so on).

The test for the number of matches, thus, can be done like this:


matches="$(ls -1 $1* 2>&1 | wc -l)"

I know, it's complicated. Worse, a quick test reveals that when there are zero matches, ls -l actually generates an error message: ls: No such file or directory. That's not good. The solution? Add a grep to the sequence:


matches="$(ls -1 $1* 2>&1 | grep -v "No such file" | wc -l)"

That's even more complicated, but it works exactly as we'd like. “matches” is zero in the situation where there aren't any matches; otherwise, it has the number of matching files and folders for the given pattern.

A test now lets us produce a meaningful and informative error message:

if [ $matches -eq 0 ] ; then
    echo "Error: no files match pattern $1*"
    exit 0
fi

Because we're looking at stderr versus stdout, we also could more properly route that error message to stderr with >&2, and to be totally correct, we should exit with a nonzero error code to indicate that the script failed to execute properly. I'll leave those tweaks as an exercise for the reader.

Now that we know we'll never hit the for loop without at least one match, the core code is straightforward:

for name in $1*
do
    new="$(echo $name | sed "s/$1/$2/")"
    mv $name $new
done

Notice in this instance that you can't use the single quotes within the $( ) command substitution; if you do, $1 and $2 won't be expanded properly.

We certainly could just stop here and have a useful little script, but I'm into wicked cool scripts, so let's push on, shall we?

Sequential File Numbering

The other feature I constantly find myself needing is the ability to number a series of files sequentially. For example, a final set of photos from a photo shoot might be DSC1017, DSC1019, DSC1023 and DSC1047. It would be more useful to be able to renumber those before sending them to a client, so that they're DSC-1, DSC-2, DSC-3 and so on.

This is pretty easily accomplished too, now that we have a script that renames a sequence of files. Here's how I accomplish it in the script itself:

if [ $renumber -eq 1 ] ; then
    suffix="$(echo $name | cut -d. -f2- | tr '[A-Z]' '[a-z]')"
    new="$2$count.$suffix"
    count=$(( $count + 1 ))
    mv $name $new
    chmod a+r $new
fi

Here I am expecting to replace the entire filename, so I strip out and save the filename suffix (for example, DSC1015.JPG becomes JPG), so I can re-attach it later. While I'm at it, filename suffixes also are normalized to all lowercase using the handy tr command.

The count variable keeps track of what number we're on, and notice the built-in shell notation of $(( )) for mathematical calculations.

Finally, the new filename is built from the new pattern ($2), plus the count ($count), plus the filename suffix ($suffix) in this line:

new="$2$count.$suffix"

The two conditions need to be merged, however, so the final script ends up with an if-then-else-fi structure.

I can't leave well enough alone, so I continued to tweak the script by adding a few starting flags too. To parse it all, our friend getopt is utilized:

args=$(getopt npt $*)

if [ $? != 0 -o $# -lt 2 ] ; then
    echo "Usage: $(basename $0) {-p} {-n} {-t} PATTERN NEWPATTERN"
    echo "
    echo " -p  rewrites PNG to png"
    echo " -n  sequentially numbers matching files with"
    echo "     NEWPATTERN as base filename"
    echo " -t  test mode: show what you'll do, don't do it."
    exit 0
fi

set -- $args
for i
do
    case "$i" in
    -n ) renumber=1 ; shift ;;
    -p ) fixpng=1   ; shift ;;
    -t ) doit=0     ; shift ;;
    -- ) shift      ; break ;;
fi

I've written about getopt and its complicated usage in shell scripts before if you want to read up on it [see “Parsing Command-Line Options with getopt” in the July 2009 issue of LJ, www.linuxjournal.com/article/10495]. Note that three flags are available to the script user: -n invokes the renumbering capability (which means the filenames are discarded, remember); -p is a special case where .PNG also is rewritten as .png; and -t is a sort of “echo-only” mode where the rename doesn't actually happen, the script just shows what it would do based on the patterns given.

How am I using it now? Like this:

rename -n IMG_ iphone-copy-paste-

Every matching .PNG file (IMG_*) has that portion of its name replaced with “iphone-copy-paste-”, and as it proceeds, “PNG” is also rewritten as “png”.

The entire rename script can be found on the Linux Journal FTP server at ftp.linuxjournal.com/pub/lj/listings/issue199/10885.tgz.