POSIX-compatible shells
We attempt to concisely describe the design and use of the shell command language, including relevant standards, practical extensions, and common pitfalls. We also briefly explore the applicability of shell to control utilities outside of the standards’ scope. This document is in early stages of development.
Introduction
The emergence by the late s of a number of competing Unix-like systems made it necessary to establish some common ground of principles, terminology and interfaces an application writer can rely upon. Such a basis was described in the IEEE Std 1003.1 ‘POSIX.1’ standard (also known as ISO/IEC 9945), originally published in , and subsequently going through several revisions leading to the as of this writing latest edition.
A separate volume of the standard is dedicated to a common facility of Unix-like systems since the earliest versions: a command line interpreter, or shell. The variant described in the standard is derivative of Bourne shell (first released in ), itself developed to overcome limitations inherent in the original Thompson shell (), and incompatible with somewhat earlier and also popular C shell (.)
As described, shell can be understood as a general-purpose language, although with emphasis on the processing of plain text (which is to say, encoded in an ASCII-compatible manner) content. Alternatively, shell can be viewed as a meta-programming language: a language meant to tie programs written in a diversity of languages into a desired solution. We will hold both views equally valid.
In this document we will mainly focus on the POSIX version of the shell language and associated utilities, as described in the XCU volume, but will also consider specific implementations, such as those provided by BusyBox, GNU and NetBSD projects.
We will also dedicate a fraction of this document to the discussion of the use of shell alongside software outside of the scope of the standard. However, given the sheer variety of real-life problems that can be, and are, solved in practice with shell, we understand that such examples will most likely be both insufficient and superfluous to the reader at the same time.
Basic command lines
One of the most basic commands available in the shell language is a colon, :
. Consider the following example.
As can be seen (note that the ‘$
’ command prompt is emitted by the shell to indicate that it is ready to take user’s command), the :
command takes no input and produces no output. We will also note for later that it leaves successful exit status to the caller, making this command suitable as an always true condition in flow control constructs; as well as a do-nothing substitute where a command is required by the shell syntax, but not needed for the task at hand.
Another command to consider is printf
(print formatted.)
The example demonstrates an important feature of the language known as quoting, as well as a trait of shell commands to provide domain specific languages (also known as little languages) of their own.
Namely, in the example above, printf
is the name of the utility; the command word portion of the command line. The utility is implemented in such a way so to interpret its first argument, or in this case %s\n
(note the single backslash character) as an instruction on how to format the subsequent arguments. Specifically, %s
means that the corresponding argument will be output as-is, without any formatting whatsoever; while the immediately following \n
tells printf
to output an ASCII line feed (LF, decimal value 10, also often referred to as newline) control code. The code will in turn instruct the software (such as a terminal emulator or a text editor) rendering this data to the user to end the respective text line where it appears, meaning that every argument to printf
after the first will end up on its own line, as seen in the example.
The remaining arguments specify the data to be formatted; such as the word hello
, printed, as specified, on the first line of the output alone.
Note that so far a single blank (space) character separated the command word from the arguments and the arguments from one another. When it becomes necessary to make a blank a part of an argument (or command, but it’s something very rarely, if ever, used in practice), we can suppress its interpretation as a separator character by using double quotes, which fulfill this role but are not passed to the command itself. As such, the third argument is specified in the code as "Hello, world!"
, but the argument printf
itself sees will be Hello, world!
, with the quotes removed (and the blank character preserved.) As specified by the format instructions, this argument ends up on the second line of the printf
command output.
Double quotes suppress the special treatment of enclosed blank characters by the shell. Sometimes, however, we may want to suppress the special meaning of the double quote character itself, which can be done by preceding it with a backslash (\
), as used for the fourth argument in the example above. In fact, backslash forces the literal interpretation of any single character immediately after it, including itself (but excluding newline), as used to include one in the first argument to printf
.
Two other quoting constructs to consider are single quotes and (GNU Bash-specific) $'text'
.
As the example shows, a single-quote suppresses the special interpretation by shell of all characters beside itself. The $'text'
construct is similar, but within it a backslash acquires a special meaning, similar to that it has to printf
; in particular, the \n
within is replaced by the shell with the newline control code.
One possible way to include a single '
character within an otherwise ''
-quoted text is by replacing it with '\''
: the first single quote here will close the ''
-construct, then \'
will be interpreted as a literal '
, and the final '
will reopen the quoting construct. We will need this at a later point, where we’ll consider embedding code and calls to other interpreters within shell programs.
Pipelines
Another important shell feature to consider is the ability to specify pipelines, where a sequence of commands is given so that the output of one command provides input to the next. Pipelines are specified using the |
operator between the commands.
Note that the actual output of the example pipeline contains ASCII horizontal tabulation (HT, decimal value 9, also often referred to as tab) control codes. To ensure proper rendering, these were replaced in this and the following examples by appropriate number of blanks.
Here, we’ve used the printf
command from before to output four lines of text, including one empty one, and passed them to the nl
(number lines) command to number the three non-empty ones.
We must, however, note that the exact nl
function is somewhat more complicated than just numbering non-empty lines. Namely, nl
considers its input to be separated into logical pages, each with its own header, body and footer. By default, portions other than body are not numbered; numbering is started anew for each page; and the specially-formatted lines that separate the three are not printed at all. This can potentially lead to security issues if arbitrary user input is processed with nl
and the code receiving the output does not expect the numbering to reset at times, or parts of the input disappearing.
This problem is not specific to nl
, either; failure to consider finer points of operation of such utilities as echo
, ls
and xargs
seems like a common cause of shell code behaving unexpectedly, especially when facing arbitrary data.
A cleaner solution would be to use the -n
option to the cat
command, which is a non-standard, yet common, extension. Otherwise, a portable solution can be crafted using a simple awk
program (that can be embedded into the shell code), which we’ll introduce in a later section of this document.
Perhaps a more practical command to consider would, however, be sort
, which orders the lines passed through not unlike how words in a dictionary would be sorted.
Alternatively, the -n
option to sort
will instruct it to order its input according to the longest initial part of string that can be interpreted as a (decimal) integer. Moreover, the -r
option will reverse the order from the default ascending to descending.
There’re several new things to consider. First, we’ve supplied sort
with two separate options, -n
to order according to the numerical values, as well as -r
to reverse the default ascending order. By convention, it’s possible to pass such two options either as two arguments (-n -r
), or as a single combined -nr
argument.
Also, we’ve already mentioned that the backslash character does not cause a subsequent newline to be interpreted literally. In fact, it suppresses it altogether, which allowed us to split the single long command in the example into four lines for better readability. Note also that arbitrarily long sequences of blank characters serve as argument separators just the same as single blanks we’ve used so far do.
References
awk
awk
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /awk .html - Bash, GNU
- GNU Bash. — URI:
http:
//gnu .org /s /bash/ - BusyBox
- BusyBox. — URI:
http:
//busybox .net/ cat
cat
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /cat .html - Colon
:
- Colon // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:
http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /V3_chap02 .html #tag_18_16 echo
echo
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /echo .html expr
expr
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /expr .html find
find
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /find .html - GNU
- The GNU Operating System and the Free Software Movement. — URI:
http:
//gnu .org/ grep
grep
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /grep .html ls
ls
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /ls .html - NetBSD
- The NetBSD Project. — URI:
http:
//netbsd .org/ nl
nl
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /nl .html - POSIX
- IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:
http:
//pubs .opengroup .org /onlinepubs /9699919799 /mindex .html printf
printf
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /printf .html sed
sed
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /sed .html - Shell
- Shell Command Language // IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:
http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /V3_chap02 .html sort
sort
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /sort .html xargs
xargs
// IEEE Std 1003.1-2017 (POSIX.1-2017.) — URI:http:
//pubs .opengroup .org /onlinepubs /9699919799 /utilities /xargs .html