By Bernhard Bablok and Nils Magnus
Despite recent competition from powerful alternatives such as Zsh [1], the Bourne-again shell (Bash) [2] is still the king of the hill on the Linux console. Users can use Bash interactively, and it also serves as a simple yet practical scripting language. Bash is part of the backbone of any working Linux system - all the more reason to investigate the benefits of upgrading to the new Bash 4 release, which appeared February 2009.
On production systems, you might want to consider whether it is really necessary to upgrade to Bash 4. The major distributions will eventually spread the new version through their own updates, so the new Bash will reach you someday whether you download and install it or not. Programmers and many power users, on the other hand, like to embrace the goodies a new version offers as quickly as possible.
If you want to get a head start on new Bash features that are making their way to the next generation of Linux systems, you'll enjoy spending some quality time with Bash 4.
Table 1 provides a summary of some important new features; for a complete list, check out the NEWS file in the Bash documentation. Here, we highlight some of the most important changes.
Command-line users will appreciate a few inconspicuous, but very useful, extensions debuting with the latest Bash. For example, the string expands to a list of files and paths below the current working directory in a fashion similar to the external find command. However, users need to enable the feature by issuing the command shopt -s globstar.
The developers have now adopted a more user friendly approach to what used to be one of the greatest mysteries of the Bourne shell: the way standard error message output is redirected. Instead of the 2>&1 1>file mantra, users can now use &>> file to redirect both error and standard output into a file. The |& shortcut, which redirects the standard error for a command to a pipe, is another useful addition.
One popular myth is that Bash scripts create too many processes, which ultimately affects performance. But many of the more simple applications once used with Bash, including sed, grep, basename, or dirname are no longer necessary; Bash handles these tasks just as quickly as any other scripting language with on-board tools. Despite these improvements, Bash programmers cast many an envious glance at Perl and Python, both of which have more versatile data structures.
In Version 4, Bash finally adds associative arrays to its existing single-dimensional arrays. For many coders, this change in itself is reason enough to move to the new version because it provides a more elegant approach to many problems. For example, developers can use arbitrary strings as indices in associative arrays, rather than just as integers. Listing 1 gives an example.
Listing 1: Programming with Associative Arrays |
01 #!/bin/bash 02 03 declare -A name 04 05 name["Linus"]="Torvalds" 06 name["Bill"]="Gates" 07 name["Steve"]="Jobs" 08 name["George W."]="Bush" 09 10 # output all values: 11 echo "Values: ${name[@]}" 12 13 # output all keys: 14 echo "Key: ${!name[@]}" 15 16 # Access individual values 17 for v in "${!name[@]}"; do 18 echo "$v ${name[$v]}" 19 done |
Listing 2 is a script that sorts files in directories according to their properties - for example, the creation date. The script has two fundamental problems: First, it does not work with directories that contain blanks, and second, it might process individual directories multiple times.
Listing 2: Erroneous Strings in Lists |
01 #!/bin/bash 02 03 dirs="" 04 05 for f in "$@"; do 06 d=`getDir "$f"` 07 mkdir -p "$d" 08 mv -f "$f" "$d" 09 dirs="$dirs $d" 10 done 11 12 for d in $dirs; do 13 createIndex "$d" 14 done |
Previous Bash versions have taken different approaches to solving this issue. In Bash 3.2, programmers can store quoted directory names in strings or an array. A script could prevent double processing by searching in the string - or by a slow linear search in the array. Neither of these approaches is particularly elegant.
Bash 4 handles this task far more simply (see Listing 3). The directory name serves as a key; the value itself is of no interest. As of line 12, the loop iterates over all the keys. The special construction with ampersands in double quotes does what it does anywhere in Bash: It tells the shell to process the values as individual tokens. The solution thus works with blanks in the target directories.
Listing 3: Associative Arrays |
01 #!/bin/bash 02 03 declare -A dirs 04 05 for f in "$@"; do 06 d=`getDir "$f"` 07 mkdir -p "$d" 08 mv -f "$f" "$d" 09 dirs[$d]=1 10 done 11 12 for d in "${!dirs[@]}"; do 13 createIndex "$d" 14 done |
Listing 4 contains a typical programming pattern. The while loop in lines 5 through 8 parses the lines from an input file one after another, storing the results in an array. This construction occurs frequently, but unfortunately, it is imperfect. If the last line does not end with a newline character, the loop will not store the line. The built-in read command does not perform the assignment until the line is terminated.
Listing 4: Parsing Files into an Array - Legacy Approach |
01 #!/bin/bash 02 03 inputFile="$1" 04 i=0 05 while read line; do 06 lines[$i]="$line" 07 let i++ 08 done < "$inputFile" 09 10 # Ongoing processing of the array lines ... |
The new Bash version not only saves programmers some typing, it provides a cleaner implementation. Instead of the while loop, a single line is all it takes (see line 4 in Listing 5). The mapfile command additionally uses the readarray alias, which describes the purpose more aptly.
Listing 5: Bash 4 Parses Files into an Array |
01 #!/bin/bash 02 03 inputFile="$1" 04 mapfile -n 0 lines < "$inputFile" 05 06 # Ongoing processing of the array lines ... |
The mapfile command can do even more. If you are interested in more detail, type help mapfile or visit the man page. The command can parse multiple lines in a single pass and process these lines one by one using a callback function. Unfortunately, the developers did not provide an especially elegant implementation of this function. Bash 4 calls the callback before parsing and not after. This approach gives the script a callback before the shell has actually read anything, but loses a callback after the last line. In spite of this complication, the function is still useful for parsing complete files.
Dependencies and Compatibility |
Fortunately, the only dependency the new, lean Bash package needs to resolve is that of Version 6 of the Readline library. The installation is particularly easy on openSUSE; working as the system administrator, you simply replace the existing bash and bash-doc packages with the new versions and install libreadline6 parallel to libreadline5. The packages are available from the Build Service [3]. Debian offers a package in its experimental branch [4]. Users of other distributions might prefer to wait or build the package from the source code. The first place to look for compatibility information is the COMPAT file in the documentation directory. It lists eight items, none of which are fundamental but relate to Posix compatibility. To discover whether or not this is an issue, you will need to run your own tests. Another important compatibility test relates to the init system with its many start and stop scripts. For example, some issues with openSUSE's /etc/init.d/network script cropped up. The check_firewall() function returns negative values, which doesn't make sense for two reasons: The range of valid values is between 0 and 255, and the script does not actually evaluate the return value that precisely. This appears to be and openSUSE error. This does not faze the Bash 3.2 parser, but the negative return codes cause errors in Bash 4. As a consequence, openSUSE still starts the firewall, even if the system configuration tells it to do otherwise. An error like this could lead to problems for systems that the administrator has not actually changed. |
If you have ever had to implement a robust approach to processing configuration files or user input, you will be aware of the following issue: Is the value in the configuration file or the result of a read operation YES, yes, Yes, true, TRUE, or True? Up to and including Bash 3.2, script authors used a call to tr to convert the value unambiguously to upper or lower case.
Bash 4 takes a far easier approach. The declare -u Varname instruction automatically converts all assignments to the Varname variable to upper case. The similar declare -l function converts to lower case. These conversion tools save the programmer the trouble of calling an external program and improve the script's performance. If you just want to convert once, rather than globally, you can use the new parameter extension:
foo=yes echo ${foo^} echo ${foo^^}
The first command converts the first letter to uppercase, thus outputting Yes. The second command converts all the letters and thus returns a result of YES.
If you need to convert to lowercase letters, you can use:
foo=YES echo ${foo,} echo ${foo,,}
The ${Var^Pattern} syntax supports even more complex replacements.
Most hardware manufacturers today offer multiple core CPUs, and techniques for exploiting multi-core functionality are starting to appear. Bash 4 allows programmers to launch what are referred to as co-processes:
coproc pipes command
This command returns the descriptors for standard input and output for the command in the pipes[0] and pipes[1] variables. The main Bash process uses them to communicate with the co-process, which is particularly useful for shell scripts in parallel processing [5].
Automatic value lists, or brace expansions, have been around for a while, but most users don't know about them: The call to echo {5..15} counts from the first to the last number. Many Bash programmers still use a call to seq for this:
echo $(seq -s " " 5 15)
The seq command is not just slower, the result is also more difficult to read. But if you need to sort this kind of output, you have to contend with the drawback of losing the leading zeros. Bash 4 shell users can now write
echo {05..15}
and, if needed, receive two-digit values with leading zeros as the return value.
The new version of the standard Linux shell offers some useful new features for programmers and command-line fans. Despite minor incompatibilities, the Bash maintainer encourages users to upgrade (see the "Interview with Bash 4 Developer Chet Ramey" box).
The new version of Bash is not exactly lean and mean - the binary now weighs in at 730KB compared with 590KB in the previous version. It is measurably, but not noticeably, slower; the slower execution times will hardly be a factor on today's hardware.
If you want to use the new functions today, you need full control of your target environment to avoid having to replace the whole Bash environment. In some scripts, you might want to query the BASH_VERSION or BASH_VERSINFO environmental variables just to be on the safe side.
Let the script terminate gracefully and output an error message if the conditions for Bash 4 are not fulfilled.
Interview with Bash 4 Developer Chet Ramey |
Chet Ramey is manager of the Network Engineering and Security Group in the IT services division of Case Western Reserve University. He has served as the Bash maintainer since 1990. Linux Magazine`s Nils Magnus asked him for some thoughts on the latest Bash release. Q: The free software community has not seen a new release of Bash in some time. Some say the 3.2 release was suitable for virtually all necessary tasks. So why something new? A: I'm glad people thought so highly of bash-3.2. I don't like to release new versions too often, since the shell is so basic to many vendor distributions, but it was time. Bash-4.0 offers a number of new features (I tend not to put major new features into minor "point" releases), lots of bug fixes that did not make it as patches to 3.2, and additional functionality for existing features. Q: Please name the three improvements or new features you are most exited about. A: Let's see. That's a tough one, since there are a number of good ones. 1. Associative arrays. 2. The fix for the last long-standing piece of Posix non-compliance. The shell no longer requires that parentheses balance when parsing $(...)-style command substitutions (e.g., when parsing a case statement inside a command substitution). I'm excited about it because it was easily the most complicated thing to implement. It was not easy to do with a yacc-generated parser. 3. The improvements possible with bind -x. When you execute a command bound to a key sequence with bind -x, that command has access to the readline line buffer and current cursor position and can change them. A shell function could call an external program to rearrange the words on the command line, for instance, and have that reflected back into the editing buffer. I don't think this has gotten wide use yet, but there are a lot of possibilities. Q: Bash 4 added a number of new features that eases programming. Do you think it can compete with scripting languages like Perl or Python? A: I think that Perl and Python are richer languages, in that they have much more built-in functionality. Shells in general are designed to tie together external programs or shell functions and provide an environment in which this is easy. However, you can write very complex programs using the shell language: Look at the bash debugger, for example. Q: Bash 4 added a number of new features for command-line users. Do you think it can compete with shells specializing on this use case, like Zsh? A: I think so. Bash may not provide as much functionality built-in, but I think it provides enough tools to make it as rich an interactive environment as a shell like Zsh. Q: Some discussion has taken place about programming style and the use of features in shell scripts. Some traditionalists mandate for plain Bourne shell compatibility, others make use of many features of Bash. What is your point of view? What about backward compatibility to Bash 3.2? Do you recommend moving on immediately or running 3.2 and 4.0 in parallel? A: I suppose it depends on your goals. It's definitely the case that when people advocate for "plain Bourne shell compatibility" they mean the version of sh running on the machine they use most frequently. There are different versions of the Bourne shell: v7, SVR2, SVR3, SVR4, SVR4.2, for starters, and different vendors have amalgams of features from different Bourne shell versions. It's hard to decide exactly what people mean when they say "vanilla Bourne shell." For folks interested in writing portable scripts, I would say code to the Posix standard. That can be considered a lowest common denominator that most of the popular shells implement. Many, if not all, vendors ship a shell that conforms to Posix. For those that don't, shells like Bash run on just about every platform out there. As for backwards compatibility with bash-3.2, I've tried to keep as much backwards compatibility as possible. There are places where I felt that the bash-3.2 behavior was wrong and corrected it, sacrificing some backwards compatibility in the process. There is also the notion of the shell's "compatibility level," which explicitly preserves some old behavior when set (look at the compat31 and compat32 shell options). I think the level of compatibility with bash-3.2 is quite high and should not affect portability of scripts. I think the compatibility is sufficient that users can upgrade to bash-4.0 right away and gradually get accustomed to the new features. Thanks for the opportunity to contribute to the magazine. |
INFO |
[1] Zsh: http://www.zsh.org
[2] Bash: http://tiswww.case.edu/php/chet/bash/bashtop.html [3] Bash 4 on openSUSE Build Service: http://software.opensuse.org/search/search?baseproject=ALL&q=bash [4] Bash 4 in Debian's Experimental repository: http://packages.debian.org/experimental/bash-builtins [5] "Parallel Bash" by Bernhard Bablok, Linux Pro Magazine, March 2009, pg. 56 |
THE AUTHOR |
Bernhard Bablok manages a data warehouse for Allianz Shared Infrastructure Services with technical performance metrics from mainframes to servers. When he is not listening to music, cycling, or walking, Bernhard enjoys working with Linux and object-oriented software. |