LJ Archive

What's GNU: Bash—The GNU Shell

Chet Ramey

Issue #4, August 1994

Conclusion of an article started last month. While originally written by Brian Fox of the Free Software Foundation, bash is now maintained by Chet Ramey. In this article, Chet explains the history of shells and then goes on to explore features specific to bash.

History

Access to the list of commands previously entered (the command history) is provided jointly by bash and the readline library. bash provides variables ($HISTFILE, $HISTSIZE, and $HISTCONTROL) and the history and fc builtins to manipulate the history list. The value of $HISTFILE specifies the file where bash writes the command history on exit and reads it on startup. $HISTSIZE is used to limit the number of commands saved in the history. $HISTCONTROL provides a crude form of control over which commands are saved on the history list: a value of ignorespace means to not save commands which begin with a space; a value of ignoredups means to not save commands identical to the last command saved. $HISTCONTROL was named $history_control in earlier versions of bash; the old name is still accepted for backward compatibility. The history command can read or write files containing the history list and display the current list contents. The fc builtin, adopted from POSIX.2 and the Korn Shell, allows display and re-execution, with optional editing, of commands from the history list. The readline library offers a set of commands to search the history list for a portion of the current input line or a string typed by the user. Finally, the history library, generally incorporated directly into the readline library, implements a facility for history recall, expansion, and re-execution of previous commands very similar to csh (“bang history”, so called because the exclamation point introduces a history substitution):

$ echo a b c d e
a b c d e
$ !! f g h i
echo a b c d e f g h i
a b c d e f g h i
$ !-2
echo a b c d e
a b c d e
$ echo !-2:1-4
echo a b c d
a b c d

The command history is only saved when the shell is interactive, so it is not available for use by shell scripts.

New Shell Variables

There are a number of convenience variables that bash interprets to make life easier. These include FIGNORE, which is a set of filename suffixes identifying files to exclude when completing filenames; HOSTTYPE, which is automatically set to a string describing the type of hardware on which bash is currently executing; command_oriented_history, which directs bash to save all lines of a multiple-line command such as a while or for loop in a single history entry, allowing easy re-editing; and IGNOREEOF, whose value indicates the number of consecutive EOF characters that an interactive shell will read before exiting—an easy way to keep yourself from being logged out accidentally. The auto_resume variable alters the way the shell treats simple command names: if job control is active, and this variable is set, single-word simple commands without redirections cause the shell to first look for and restart a suspended job with that name before starting a new process.

Brace Expansion

Since sh offers no convenient way to generate arbitrary strings that share a common prefix or suffix (pathname expansion requires that the filenames exist), bash implements brace expansion, a capability picked up from csh. Brace expansion is similar to pathname expansion, but the strings generated need not correspond to existing files. A brace expression consists of an optional preamble, followed by a pair of braces enclosing a series of comma-separated strings, and an optional postamble. The preamble is prepended to each string within the braces, and the postamble is then appended to each resulting string:

$ echo a{d,c,b}e
ade ace abe

Process Substitution

On systems that can support it, bash provides a facility known as process substitution. Process substitution is similar to command substitution in that its specification includes a command to execute, but the shell does not collect the command's output and insert it into the command line. Rather, bash opens a pipe to the command, which is run in the background. The shell uses named pipes (FIFOs) or the /dev/fd method of naming open files to expand the process substitution to a filename which connects to the pipe when opened. This filename becomes the result of the expansion. Process substitution can be used to compare the outputs of two different versions of an application as part of a regression test:

$ cmp <\>(old_prog) <(new_prog)

Prompt Customization

One of the more popular interactive features that bash provides is the ability to customize the prompt. Both $PS1 and $PS2, the primary and secondary prompts, are expanded before being displayed. Parameter and variable expansion is performed when the prompt string is expanded, so any shell variable can be put into the prompt (e.g., $SHLVL, which indicates how deeply the current shell is nested). bash specially interprets characters in the prompt string preceded by a backslash. Some of these backslash escapes are replaced with the current time, the date, the current working directory, the username, and the command number or history number of the command being entered. There is even a backslash escape to cause the shell to change its prompt when running as root by using the su command. Before printing each primary prompt, bash expands the variable $PROMPT_COMMAND and, if it has a value, executes the expanded value as a command, allowing additional prompt customization. For example, this assignment causes the current user, the current host, the time, the last component of the current working directory, the level of shell nesting, and the history number of the current command to be embedded into the primary prompt:

$ PS1='\u@\h [     ] \W($SHLVL:\!)\$ `
chet@odin [21:03:44] documentation(2:636)$ cd ..
chet@odin [21:03:54] src(2:637)$

The string being assigned is surrounded by single quotes so that if it is exported, the value of $SHLVL will be updated by a child shell:

chet@odin [21:17:35] src(2:638)$ export PS1
chet@odin [21:17:40] src(2:639)$ bash
chet@odin [21:17:46] src(3:696)$

The \$ escape is displayed as “$” when running as a normal user, but as “#” when running as root.

File System Views

Since Berkeley introduced symbolic links in 4.2 BSD, one of their most annoying properties has been the “warping” to a completely different area of the file system when using cd, and the resultant non-intuitive behavior of “cd ..”. The Unix kernel treats symbolic links physically. When the kernel is translating a pathname in which one component is a symbolic link, it replaces all or part of the pathname while processing the link. If the contents of the symbolic link begin with a slash, the kernel replaces the pathname entirely; if not, the link contents replace the current component. In either case, the symbolic link is visible. If the link value is an absolute pathname, the user finds himself in a completely different part of the file system.

bash provides a logical view of the file system. In this default mode, command and filename completion and builtin commands such as cd and pushd which change the current working directory transparently follow symbolic links as if they were directories. The $PWD variable, which holds the shell's idea of the current working directory, depends on the path used to reach the directory rather than its physical location in the local file system hierarchy. For example:

$ cd /usr/local/bin
$ echo $PWD
/usr/local/bin
$ pwd
/usr/local/bin
$ /bin/pwd
/net/share/sun4/local/bin
$ cd ..
$ pwd
/usr/local
$ /bin/pwd
/net/share/sun4/local

One problem with this, of course, arises when programs that do not understand the shell's logical notion of the file system interpret “..” differently. This generally happens when bash completes filenames containing “..” according to a logical hierarchy which does not correspond to their physical location. For users who find this troublesome, a corresponding physical view of the file system is available:

$ cd /usr/local/bin
$ pwd
/usr/local/bin
$ set -o physical
$ pwd
/net/share/sun4/local/bin

Internationalization

One of the most significant improvements in version 1.13 of bash was the change to “eight-bit cleanliness”. Previous versions used the eighth bit of characters to mark whether or not they were quoted when performing word expansions. While this did not affect the majority of users, most of whom used only seven-bit ASCII characters, some found it confining. Beginning with version 1.13, bash implemented a different quoting mechanism that did not alter the eighth bit of characters. This allowed bash to manipulate files with “odd” characters in their names, but did nothing to help users enter those names, so version 1.13 introduced changes to readline that made it eight-bit clean as well. Options exist that force readline to attach no special significance to characters with the eighth bit set (the default behavior is to convert these characters to meta-prefixed key sequences) and to output these characters without conversion to meta-prefixed sequences. These changes, along with the expansion of keymaps to a full eight bits, enable readline to work with most of the ISO-8859 family of character sets, used by many European countries.

POSIX Mode

Although bash is intended to be POSIX.2 conformant, there are areas in which the default behavior is not compatible with the standard. For users who wish to operate in a strict POSIX.2 environment, bash implements a POSIX mode. When this mode is active, bash modifies its default operation where it differs from POSIX.2 to match the standard. POSIX mode is entered when bash is started with the “-o posix” option or when set -o posix is executed. For compatibility with other GNU software that attempts to be POSIX.2 compliant, bash also enters POSIX mode if either of the variables $POSIX_PEDANTIC or $POSIXLY_CORRECT is set when bash is started or assigned a value during execution. When bash is started in POSIX mode, for example, it sources the file named by the value of $ENV rather than the “normal” startup files.

Future Plans

There are several features that will be introduced in the next version of bash, version 1.14, and a number under consideration for future releases. This section will briefly detail the new features planned for version 1.14 and describe features that may appear in later versions.

The new features available in bash-1.14 answer several of the most common requests for enhancements. Most notably, there is a mechanism for including non-visible character sequences in prompts, such as those which cause a terminal to print characters in different colors or in standout mode. There was nothing preventing the use of these sequences in earlier versions, but the readline redisplay algorithm assumed each character occupied physical screen space and would wrap lines prematurely.

Readline has a few new variables, several new bindable commands, and some additional emacs mode default key bindings. A new history search mode has been implemented: in this mode, readline searches the history for lines beginning with the characters between the beginning of the current line and the cursor. The existing readline incremental search commands no longer match identical lines more than once. Filename completion now expands variables in directory names. The history expansion facilities are now nearly completely csh-compatible: missing modifiers have been added and history substitution has been extended.

Several of the features described earlier as appearing in future releases, such as set -o posix and $POSIX_PEDANTIC, are present in version 1.14. There is a new shell variable, OSTYPE, to which bash assigns a value that identifies the version of Unix it's running on (great for putting architecture-specific binary directories into the $PATH). Two variables have been renamed: $HISTCONTROL replaces $history_control , and $HOSTFILE replaces $hostname_completion_file. In both cases, the old names are accepted for backward compatibility. The ksh select construct, which allows the generation of simple menus, has been implemented. New capabilities have been added to existing variables: $auto_resume can now take values of exact or substring, and $HISTCONTROL understands the value ignoreboth, which combines the two previously acceptable values. The dirs builtin has acquired options to print out specific members of the directory stack. The $nolinks variable, which forces a physical view of the file system, has been superseded by the -P option to the set builtin (equivalent to set -o physical); the variable is retained for backward compatibility. The version string contained in $BASH_VERSION now includes an indication of the patch level as well as the “build version”. Some little-used features have been removed: the bye synonym for exit and the $NO_PROMPT_VARS variable are gone. There is now an organized test suite that can be run as a regression test when building a new version of bash.

The documentation has been thoroughly overhauled: there is a new manual page on the readline library and the info file has been updated to reflect the current version. As always, as many bugs as possible have been fixed, although some surely remain.

There are a few features that I hope to include in later bash releases. Some are based on work already done in other shells.

In addition to simple variables, a future release of bash will include one-dimensional arrays, using the ksh implementation of arrays as a model. Additions to the ksh syntax, such as varname=( ... ) to assign a list of words directly to an array and a mechanism to allow the read builtin to read a list of values directly into an array, would be desirable. Given those extensions, the ksh

“set -A” syntax may not be worth supporting (the -A option assigns a list of values to an array, but is a rather peculiar special case).

Some shells include a means of programmable word completion, where the user specifies on a per-command basis how the arguments of the command are to be treated when completion is attempted: as filenames, hostnames, executable files, and so on. The other aspects of the current bash implementation could remain as-is; the existing heuristics would still be valid. Only when completing the arguments to a simple command would the programmable completion be in effect.

It would also be nice to give the user finer-grained control over which commands are saved onto the history list. One proposal is for a variable, tentatively named HISTIGNORE, which would contain a colon-separated list of commands. Lines beginning with these commands, after the restrictions of $HISTCONTROL have been applied, would not be placed onto the history list. The shell pattern-matching capabilities could also be available when specifying the contents of $HISTIGNORE.

One thing that newer shells such as wksh (also known as dtksh) provide is a command to dynamically load code implementing additional builtin commands into a running shell. This new builtin would take an object file or shared library implementing the “body” of the builtin (xxx_builtin() for those familiar with bash internals) and a structure containing the name of the new command, the function to call when the new builtin is invoked (presumably defined in the shared object specified as an argument), and the documentation to be printed by the help command (possibly present in the shared object as well). It would manage the details of extending the internal table of builtins.

A few other builtins would also be desirable: two are the POSIX.2 getconf command, which prints the values of system configuration variables defined by POSIX.2, and a disown builtin, which causes a shell running with job control active to “forget about” one or more background jobs in its internal jobs table. Using getconf, for example, a user could retrieve a value for $PATH guaranteed to find all of the POSIX standard utilities, or find out how long filenames may be in the file system containing a specified directory.

There are no implementation timetables for any of these features, nor are there concrete plans to include them. If anyone has comments on these proposals, feel free to send me electronic mail.

Reflections and Lessons Learned

The lesson that has been repeated most often during bash development is that there are dark corners in the Bourne Shell, and people use all of them. In the original description of the Bourne shell, quoting and the shell grammar are both poorly specified and incomplete; subsequent descriptions have not helped much. The grammar presented in Bourne's paper describing the shell distributed with the Seventh Edition of Unix is so far off that it does not allow the command who|wc. In fact, as Tom Duff states:

“Nobody really knows what the Bourne shell's grammar is. Even examination of the source code is little help.”1

The POSIX.2 standard includes a yacc grammar that comes close to capturing the Bourne shell's behavior, but it disallows some constructs which sh accepts without complaint-and there are scripts out there that use them. It took a few versions and several bug reports before bash implemented sh-compatible quoting, and there are still some “legal” sh constructs which bash flags as syntax errors. Complete sh compatibility is a tough nut.

The shell is bigger and slower than I would like, though the current version is faster than previously.

The readline library could stand a substantial rewrite.

A hand-written parser to replace the current yacc-gener-ated one would probably result in a speedup, and would solve one glaring problem: the shell could parse commands in “$(...)” constructs as they are entered, rather than reporting errors when the construct is expanded.

As always, there is some chaff to go with the wheat. Areas of duplicated functionality need to be cleaned up. There are several cases where bash treats a variable specially to enable functionality available another way ($notify vs. set -o notify and $nolinks vs.

set -o physical, for instance); the special treatment of the variable name should probably be removed. A few more things could stand removal; the $allow_null_ glob_expansion and $glob_dot_filenames variables are of particularly questionable value. The $[...] arithmetic evaluation syntax is redundant now that the POSIX-mandated $((...)) construct has been implemented, and could be deleted. It would be nice if the text output by the help builtin were external to the shell rather than compiled into it. The behavior enabled by $command_oriented_history, which causes the shell to attempt to save all lines of a multi-line command in a single history entry, should be made the default and the variable removed.

Availability

As with all other GNU software, bash is available for anonymous FTP from prep.ai.mit.edu:/pub/gnu and from other GNU software mirror sites. The current version is in bash-1.13.5.tar.gz in that directory. Use archie to find the nearest archive site. The latest version is always available for FTP from bash.CWRU.Edu:/pub/ dist. bash documentation is available for FTP from bash.CWRU.Edu:/pub/bash.

The Free Software Foundation sells tapes and CD-ROMs containing bash; send electronic mail to gnu@prep.ai.mit.edu or call +1-617-876-3296 for more information. bash is also distributed with several versions of Unix-compatible systems. It is included as /bin/sh and /bin/bash on several Linux distributions and as contributed software in BSDI's BSD/386 and FreeBSD.

Conclusion

bash is a worthy successor to sh. It is sufficiently portable to run on nearly every version of Unix from 4.3 BSD to SVR4.2, and several Unix workalikes. It is robust enough to replace sh on most of those systems, and provides more functionality. It has several thousand regular users, and their feedback has helped to make it as good as it is today-a testament to the benefits of free software.

1 Tom Duff, “Rc-A Shell for Plan 9 and UNIX systems”, Proc. of the Summer 1990 EUUG Conf., London, July, 1990, pp. 21-33

2 BSD/386 is a trademark of Berkeley Software Design, Inc.

Chet Ramey (chet@po.cwru.edu) is a programmer at Case Western Reserve University and volunteer at the Free Software Foundation.

LJ Archive