Bring the power of the Linux command line into your application development process.
As a novice software developer, the one thing I look for when choosing a programming language is this: is there a library that allows me to interface with the system to accomplish a task? If Python didn't have Flask, I might choose a different language to write a web application. For this same reason, I've begun to develop many, admittedly small, applications with Bash. Although Python, for example, has many modules to import and extend functionality, Bash has thousands of commands that perform a variety of features, including string manipulation, mathematic computation, encryption and database operations. In this article, I take a look at these features and how to use them easily within a Bash application.
Bash provides three features that I've found particularly useful when creating reusable functions: aliases, functions and command substitution. An alias is a command-line shortcut for a long command. Here's an example:
alias getloadavg='cat /proc/loadavg'
The alias for this example is getloadavg. Once defined, it can be executed as any other Linux command. In this instance, alias will dump the contents of the /proc/loadavg file. Something to keep in mind is that this is a static command alias. No matter how many times it is executed, it always will dump the contents of the same file. If there is a need to vary the way a command is executed (by passing arguments, for instance), you can create a function. A function in Bash functions the same way as a function in any other language: arguments are evaluated, and commands within the function are executed. Here's an example function:
getfilecontent() { if [ -f $1 ]; then cat $1 else echo "usage: getfilecontent <filename>" fi }
This function declaration defines the function name as getfilecontent. The if/else statement checks whether the file specified as the first function argument ($1) exists. If it does, the contents of the file is outputted. If not, usage text is displayed. Because of the incorporation of the argument, the output of this function will vary based on the argument provided.
The final feature I want to cover is command substitution. This is a mechanism for reassigning output of a command. Because of the versatility of this feature, let's take a look at two examples. This one involves reassigning the output to a variable:
LOADAVG="$(cat /proc/loadavg)"
The syntax for command substitution is $(command) where “command” is the command to be executed. In this example, the LOADAVG variable will have the contents of the /proc/loadavg file stored in it. At this point, the variable can be evaluated, manipulated or simply echoed to the console.
If there is one feature that sets scripting on UNIX apart from other environments, it is the robust ability to process text. Although many text processing mechanisms are available when scripting in Linux, here I'm looking at grep, awk, sed and variable-based operations. The grep command allows for searching through text whether in a file or piped from another command. Here's a grep example:
alias searchdate='grep ↪"[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"'
The alias created here will search through data for a date in the YYYY-MM-DD format. Like the grep command, text either can be provided as piped data or as a file path following the command. As the example shows, search syntax for the grep command includes the use of regular expressions (or regex).
When processing lines of text for the purpose of pulling out delimited fields, awk is the easiest tool for the job. You can use awk to create verbose output of the /proc/loadavg file:
awk '{ printf("1-minute: %s\n5-minute: %s\n15-minute: ↪%s\n",$1,$2,$3); }' /proc/loadavg
For the purpose of this example, let's examine the structure of the /proc/loadavg file. It is a single-line file, and there are typically five space-delimited fields, although this example uses only the first three fields. Much like Bash function arguments, fields in awk are references as variables are named by their position in the line ($1 is the first field and so on). In this example, the first three fields are referenced as arguments to the printf statement. The printf statement will display three lines, and each line will contain a description of the data and the data itself. Note that each %s is substituted with the corresponding parameter to the printf function.
Within all of the commands available for text processing on Linux, sed may be considered the Swiss army knife for text processing. Like grep, sed uses regex. The specific operation I'm looking at here involves regex substitution. For an accurate comparison, let's re-create the previous awk example using sed:
sed 's/^\([0-9]\+\.[0-9]\+\) \([0-9]\+\.[0-9]\+\) ↪\([0-9]\+\.[0-9]\+\).*$/1-minute: \1\n5-minute: ↪\2\n15-minute: \3/g' /proc/loadavg
Since this is a long example, I'm going to separate this into smaller parts. As I mentioned, this example uses regex substitution, which follows this syntax: s/search/replace/g. The “s” begins the definition of the substitution statement. The “search” value defines the text pattern you want to search for, and the “replace” value defines what you want to replace the search value with. The “g” at the end is a flag that denotes global substitution within the file and is one of many flags available with the substitute statement. The search pattern in this example is:
^\([0-9]\+\.[0-9]\+\) \([0-9]\+\.[0-9]\+\) ↪\([0-9]\+\.[0-9]\+\).*$
The caret (^) at the beginning of the string denotes the beginning of a line of text being processed, and the dollar sign ($) at the end of the string denotes the end of a line of text. Four things are being searched for within this example. The first three items are:
\([0-9]\+\.[0-9]\+\)
This entire string is enclosed with escaped parentheses, which makes the value within available for use in the replace value. Just like the grep example, the [0-9] will match a single numeric character. When followed by an escaped plus sign, it will match one or more numeric characters. The escaped period will match a single period. When you put this whole expression together, you get an pattern for a decimal digit.
The fourth item in the search value is simply a period followed by an asterisk. The period will match any character, and the asterisk will match zero or more of whatever preceded it. The replace value of the example is:
1-minute: \1\n5-minute: \2\n15-minute: \3
This is largely composed of plain text; however, it contains four unique special items. There are newline characters that are represented by the slash-“/n”. The other three items are slashes followed by a number. This number corresponds to the patterns in the search value surrounded by parentheses. Slash-1 is the first pattern in parentheses, slash-2 is the second and so on. The output of this sed command will be exactly the same as the awk command from earlier.
The final mechanism for string manipulation that I want to discuss involves using Bash variables to manipulate strings. Although this is much less powerful than traditional regex, it provides a number of ways to manipulate text. Here are a few examples using Bash variables:
MYTEXT="my example string" echo "String Length: ${#MYTEXT}" echo "First 5 Characters: ${MYTEXT:0:5}" echo "Remove \"example\": ${MYTEXT/ example/}"
The variable named MYTEXT is the sample string this example works with. The first echo command shows how to determine the length of a string variable. The second echo command will return the first five characters of the string. This substring syntax involves the beginning character index (in this case, zero) and the length of the substring (in this case, five). The third echo command removes the word “example” along with a leading space.
Although text processing might be what makes Bash scripting great, the need to do mathematics still exists. Basic math problems can be evaluated using either bc, awk or Bash arithmetic expansion. The bc command has the ability to evaluate math problems via an interactive console interface and piped input. For the purpose of this article, let's look at evaluating piped data. Consider the following:
pow() { if [ -z "$1" ]; then echo "usage: pow <base> <exponent>" else echo "$1^$2" | bc fi }
This example shows creating an implementation of the pow function from C++. The function requires two arguments. The result of the function will be the first number raised to the power of the second number. The math statement of "$1^$2" is piped into the bc command for calculation.
Although awk does provide the ability to do basic math calculation, the ability for awk to iterate through lines of text makes it especially useful for creating summary data. For instance, if you want to calculate the total size of all files within a folder, you might use something like this:
foldersize() { if [ -d $1 ]; then ls -alRF $1/ | grep '^-' | awk 'BEGIN {tot=0} { ↪tot=tot+$5 } END { print tot }' else echo "$1: folder does not exist" fi }
This function will do a recursive long-listing for all entries underneath the folder supplied as an argument. It then will search for all lines beginning with a dash (this will select all files). The final step is to use awk to iterate through the output and calculate the combined size of all files.
Here is how the awk statement breaks down. Before processing of the piped data begins, the BEGIN block sets a variable named tot to zero. Then for each line, the next block is executed. This block will add to tot the value of the fifth field in each line, which is the file size. Finally, after the piped data has been processed, the END block then will print the value of tot.
The other way to perform basic math is through arithmetic expansion. This will take a similar visual for the command substitution. Let's rewrite the previous example using arithmetic expansion:
pow() { if [ -z "$1" ]; then echo "usage: pow <base> <exponent>" else echo "$[$1**$2]" fi }
The syntax for arithmetic expansion is $[expression], where expression is a mathematic expression. Notice that instead of using the caret operator for exponents, this example uses a double-asterisk. Although there are differences and limitations to this method of calculation, the syntax can be more intuitive than piping data to the bc command.
The ability to perform cryptographic operations on data may be necessary depending on the needs of an application. If a string needs to be hashed, a file needs to be encrypted, or data needs to be base64-encoded, this all can be accomplished using the openssl command. Although openssl provides a large set of ciphers, hashing algorithms and other functions, I cover only a few here.
The first example shows encrypting a file using the blowfish cipher:
bf-enc() { if [ -f $1 ] && [ -n "$2" ]; then cat $1 | openssl enc -blowfish -pass pass:$2 > $1.enc else echo "usage: bf-enc <file> <password>" fi }
This function requires two arguments: a file to encrypt and the password to use to encrypt it. After running, this script produces a file named the same as your original but with the file extension of “enc”.
Once you have the data encrypted, you need a function to decrypt it. Here's the decryption function:
bf-dec() { if [ -f $1 ] && [ -n "$2" ]; then cat $1 | openssl enc -d -blowfish -pass pass:$2 > ↪${1%%.enc} else echo "usage: bf-dec <file> <password>" fi }
The syntax for the decryption function is almost identical to the encryption function with the addition of “-d” to decrypt the piped data and the syntax to remove “.enc” from the end of the decrypted filename.
Another piece of functionality provided by openssl is the ability to create hashes. Although files may be hashed using openssl, I'm going to focus on hashing strings here. Let's make a function to create an MD5 hash of a string:
md5hash() { if [ -z "$1" ]; then echo "usage: md5hash <string>" else echo "$1" | openssl dgst -md5 | sed 's/^.*= //g' fi }
This function will take the string argument provided to the function and generate an MD5 hash of that string. The sed statement at the end of the command will strip off text that openssl puts at the beginning of the command output, so that the only text returned by the function is the hash itself.
The way that you would validate a hash (as opposed to decrypting it) is to create a new hash and compare it to the old hash. If the hashes match, the original strings will match.
I also want to discuss the ability to create a base64-encoded string of data. One particular application that I have found this useful for is creating an HTTP basic authentication header string (this contains username:password). Here is a function that accomplishes this:
basicauth() { if [ -z "$1" ]; then echo "usage: basicauth <username>" else echo "$1:$(read -s -p "Enter password: " pass ; ↪echo $pass)" | openssl enc -base64 fi }
This function will take the user name provided as the first function argument and the password provided by user input through command substitution and use openssl to base64-encode the string. This string then can be added to an HTTP authorization header field.
An application is only as useful as the data that sits behind it. Although there are command-line tools to interact with database server software, here I focus on the SQLite file-based database. Something that can be difficult when moving an application from one computer to another is that depending on the version of SQLite, the executable may be named differently (typically either sqlite or sqlite3). Using command substitution, you can create a fool-proof way of calling sqlite:
$(ls /usr/bin/sqlite* | grep 'sqlite[0-9]*$' | head -n1)
This will return the full file path of the sqlite executable available on a system.
Consider an application that, upon first execution, creates an empty database. If this syntax is used to invoke the sqlite binary, the empty database always will be created using the correct version of sqlite on that system.
Here's an example of how to create a new database with a table for personal information:
$(ls /usr/bin/sqlite* | grep 'sqlite[0-9]*$' | head -n1) test.db ↪"CREATE TABLE people(fname text, lname text, age int)"
This will create a database file named test.db and will create the people table as described. This same syntax could be used to perform any SQL operations that SQLite provides, including SELECT, INSERT, DELETE, DROP and many more.
This article barely scrapes the surface of commands available to develop console applications on Linux. There are a number of great resources for learning more in-depth scripting techniques, whether in Bash, awk, sed or any other console-based toolset. See the Resources section for links to more helpful information.