POSIX-compatible shells

We attempt to concisely describe the design and use of the shell command language, including relevant standards, practical extensions, and common pitfalls. We also briefly explore the applicability of shell to control utilities outside of the standards’ scope. This document is in early stages of development.

Introduction

The emergence by the late s of a number of competing Unix-like systems made it necessary to establish some common ground of principles, terminology and interfaces an application writer can rely upon. Such a basis was described in the IEEE Std 1003.1 ‘POSIX.1’ standard (also known as ISO/IEC 9945), originally published in , and subsequently going through several revisions leading to the as of this writing latest edition.

A separate volume of the standard is dedicated to a common facility of Unix-like systems since the earliest versions: a command line interpreter, or shell. The variant described in the standard is derivative of Bourne shell (first released in ), itself developed to overcome limitations inherent in the original Thompson shell (), and incompatible with somewhat earlier and also popular C shell (.)

As described, shell can be understood as a general-purpose language, although with emphasis on the processing of plain text (which is to say, encoded in an ASCII-compatible manner) content. Alternatively, shell can be viewed as a meta-programming language: a language meant to tie programs written in a diversity of languages into a desired solution. We will hold both views equally valid.

In this document we will mainly focus on the POSIX version of the shell language and associated utilities, as described in the XCU volume, but will also consider specific implementations, such as those provided by BusyBox, GNU and NetBSD projects.

We will also dedicate a fraction of this document to the discussion of the use of shell alongside software outside of the scope of the standard. However, given the sheer variety of real-life problems that can be, and are, solved in practice with shell, we understand that such examples will most likely be both insufficient and superfluous to the reader at the same time.

Basic command lines

One of the most basic commands available in the shell language is a colon, :. Consider the following example.

Example: colon (:) shell language command.
$ : 
$ 

As can be seen (note that the ‘$ command prompt is emitted by the shell to indicate that it is ready to take user’s command), the : command takes no input and produces no output. We will also note for later that it leaves successful exit status to the caller, making this command suitable as an always true condition in flow control constructs; as well as a do-nothing substitute where a command is required by the shell syntax, but not needed for the task at hand.

Another command to consider is printf (print formatted.)

Example: printf utility and shell language quoting constructs.
$ printf %s\\n hello "Hello, world!" \" 
hello
Hello, world!
"
$ 

The example demonstrates an important feature of the language known as quoting, as well as a trait of shell commands to provide domain specific languages (also known as little languages) of their own.

Namely, in the example above, printf is the name of the utility; the command word portion of the command line. The utility is implemented in such a way so to interpret its first argument, or in this case %s\n (note the single backslash character) as an instruction on how to format the subsequent arguments. Specifically, %s means that the corresponding argument will be output as-is, without any formatting whatsoever; while the immediately following \n tells printf to output an ASCII line feed (LF, decimal value 10, also often referred to as newline) control code. The code will in turn instruct the software (such as a terminal emulator or a text editor) rendering this data to the user to end the respective text line where it appears, meaning that every argument to printf after the first will end up on its own line, as seen in the example.

The remaining arguments specify the data to be formatted; such as the word hello, printed, as specified, on the first line of the output alone.

Note that so far a single blank (space) character separated the command word from the arguments and the arguments from one another. When it becomes necessary to make a blank a part of an argument (or command, but it’s something very rarely, if ever, used in practice), we can suppress its interpretation as a separator character by using double quotes, which fulfill this role but are not passed to the command itself. As such, the third argument is specified in the code as "Hello, world!", but the argument printf itself sees will be Hello, world!, with the quotes removed (and the blank character preserved.) As specified by the format instructions, this argument ends up on the second line of the printf command output.

Double quotes suppress the special treatment of enclosed blank characters by the shell. Sometimes, however, we may want to suppress the special meaning of the double quote character itself, which can be done by preceding it with a backslash (\), as used for the fourth argument in the example above. In fact, backslash forces the literal interpretation of any single character immediately after it, including itself (but excluding newline), as used to include one in the first argument to printf.

Two other quoting constructs to consider are single quotes and (GNU Bash-specific) $'text'.

Example: shell language single-quote constructs.
bash$ printf %s\\n 'Hello\n=====' $'Hello\n=====' 
Hello\n=====
Hello
=====
bash$ 

As the example shows, a single-quote suppresses the special interpretation by shell of all characters beside itself. The $'text' construct is similar, but within it a backslash acquires a special meaning, similar to that it has to printf; in particular, the \n within is replaced by the shell with the newline control code.

One possible way to include a single ' character within an otherwise ''-quoted text is by replacing it with '\'': the first single quote here will close the ''-construct, then \' will be interpreted as a literal ', and the final ' will reopen the quoting construct. We will need this at a later point, where we’ll consider embedding code and calls to other interpreters within shell programs.

Pipelines

Another important shell feature to consider is the ability to specify pipelines, where a sequence of commands is given so that the output of one command provides input to the next. Pipelines are specified using the | operator between the commands.

Example: a pipeline.
$ printf %s\\n One.  Two.  ""  Three. | nl 
     1  One.
     2  Two.
       
     3  Three.
$ 

Note that the actual output of the example pipeline contains ASCII horizontal tabulation (HT, decimal value 9, also often referred to as tab) control codes. To ensure proper rendering, these were replaced in this and the following examples by appropriate number of blanks.

Here, we’ve used the printf command from before to output four lines of text, including one empty one, and passed them to the nl (number lines) command to number the three non-empty ones.

We must, however, note that the exact nl function is somewhat more complicated than just numbering non-empty lines. Namely, nl considers its input to be separated into logical pages, each with its own header, body and footer. By default, portions other than body are not numbered; numbering is started anew for each page; and the specially-formatted lines that separate the three are not printed at all. This can potentially lead to security issues if arbitrary user input is processed with nl and the code receiving the output does not expect the numbering to reset at times, or parts of the input disappearing.

This problem is not specific to nl, either; failure to consider finer points of operation of such utilities as echo, ls and xargs seems like a common cause of shell code behaving unexpectedly, especially when facing arbitrary data.

Example: the nl command treating its input as a sequence of two logical pages, one with a (non-numbered) header.
$ printf %s\\n \\: One. \\:\\: Two.  "" \\:\\:  Three. | nl 
       One.
     1	Two.
       
     1	Three.
$ 

A cleaner solution would be to use the -n option to the cat command, which is a non-standard, yet common, extension. Otherwise, a portable solution can be crafted using a simple awk program (that can be embedded into the shell code), which we’ll introduce in a later section of this document.

Example: the cat command instructed to number lines.
$ printf %s\\n \\: One. \\:\\: Two.  "" | cat -n 
     1	\:
     2	One.
     3	\:\:
     4	Two.
     5	
$ 

Perhaps a more practical command to consider would, however, be sort, which orders the lines passed through not unlike how words in a dictionary would be sorted.

Example: the sort command ordering its input lines according to the order of characters in the locale currently in effect (T after O, and w after h.)
$ printf %s\\n One.  Two.  Three. | sort 
One.
Three.
Two.
$ 

Alternatively, the -n option to sort will instruct it to order its input according to the longest initial part of string that can be interpreted as a (decimal) integer. Moreover, the -r option will reverse the order from the default ascending to descending.

Example: the sort command instructed to order its input in numerical descending order.
$ printf %s\\n \
      "78306 reassembled" "102376 zucchini" \
      "36383 crumbles"    "67751 noncompliance" \
      | sort -rn 
102376 zucchini
78306 reassembled
67751 noncompliance
36383 crumbles
$ 

There’re several new things to consider. First, we’ve supplied sort with two separate options, -n to order according to the numerical values, as well as -r to reverse the default ascending order. By convention, it’s possible to pass such two options either as two arguments (-n -r), or as a single combined -nr argument.

Also, we’ve already mentioned that the backslash character does not cause a subsequent newline to be interpreted literally. In fact, it suppresses it altogether, which allowed us to split the single long command in the example into four lines for better readability. Note also that arbitrarily long sequences of blank characters serve as argument separators just the same as single blanks we’ve used so far do.

References

awk
awk // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
Bash, GNU
GNU Bash. — URI: http://gnu.org/s/bash/
BusyBox
BusyBox. — URI: http://busybox.net/
cat
cat // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cat.html
Colon
:
Colon // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_16
echo
echo // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html
expr
expr // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/expr.html
find
find // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html
GNU
The GNU Operating System and the Free Software Movement. — URI: http://gnu.org/
grep
grep // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html
ls
ls // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html
NetBSD
The NetBSD Project. — URI: http://netbsd.org/
nl
nl // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/nl.html
POSIX
IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/mindex.html
printf
printf // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
sed
sed // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
Shell
Shell Command Language // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
sort
sort // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html
xargs
xargs // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html