Bourne is not bash (or: read, echo, and backslash)
Bourne is not bash
Anyone who has ever written a shell script is familiar with the incantation, #!/bin/sh
. This tells a Unix loader that the file should be interpreted by a Bourne shell. The Bourne shell is an appropriate lightweight interpreter for simple scripts that invoke a couple other programs. When you are using it, though, it's important to remember that bash
and Bourne are subtly different shells.
The conflation of the two shells is exacerbated further by many Linux distributions having /bin/sh
simply be a symlink to /bin/bash
. Debian-based distributions, including Ubuntu, depart from this practice, instead having /bin/sh
linked to /bin/dash
, a faster, simpler, and POSIX-compliant version of the Bourne shell. In the process, of course, many bash
conveniences are lost, and unexpected problems can arise if you are expecting bash
behavior.
A cat
like example
Let me give a simple illustration of the problem. Let's implement a Bourne version of cat
, bourne-cat.sh
:
#!/bin/sh
while read line; do
echo "$line"
done
If you've never used while read ...
in a shell script, it's a very useful command. If you run man builtins
, you'll learn that read name...
reads a line from standard input, assigning each (whitespace-delimited) word to each variable name. When the last name is reached, the rest of the line is placed into that variable.
In my example above, the loop iterates for each line of standard input, assigning the text to the line variable. The echo
command then prints that line out. Simple enough, right?
Problems with backslash
Let's try out our new script:
$ echo 'abcdefghijklmnop
123456789' | /bin/dash bourne-cat.sh
abcdefghijklmnop
123456789
So, exactly what we expect. Now let's break it:
$ echo 'abcdefghijklm\nop
123456789'
abcdefghijklm\nop
123456789
$ echo 'abcdefghijklm\nop
123456789' | /bin/dash bourne-cat.sh
abcdefghijklmnop
123456789
It swallowed the backslash--definitely not what we wanted. So what's going on?
read
interprets backslash
Referring back to the builtins
man page, "The backslash character (\) may be used to remove any special meaning for the next character read and for line continuations." That is, read
considers backslash to be a special character, so the line variable does not contain the backslash. If we do not want that behavior, we have to use read -r
. Here's bourne-cat-read-fixed.sh
:
#!/bin/sh
while read -r line; do
echo "$line"
done
Now our problem should be fixed:
$ echo 'abcdefghijklm\nop
123456789' | /bin/dash bourne-cat-read-fixed.sh
abcdefghijklm
op
123456789
Hmmm... So what happened there?
Bourne's echo interprets backslash
Since we're reasonably sure that read
is now behaving nicely, there must be a problem with echo
. Reading the man page, we see that echo
accepts a -E
option that disables the interpretation of backslash escapes. According to the man page, however, this is the default behavior. Oh well, let's throw the option in there to see if it makes a difference in bourne-cat-E.sh
:
#!/bin/sh
while read -r line; do
echo -E "$line"
done
Running it:
$ echo 'abcdefghijklm\nop
123456789' | /bin/dash bourne-cat-E.sh
-E abcdefghijklm
op
-E 123456789
Seriously? Our echo
command is ignoring the -E
option entirely, happily outputting both it and the interpreted newline. This is getting a bit nutty.
When echo(1)
isn't echo
Returning to the echo
man page, you will see an important caveat near the bottom, saying "NOTE: your shell may have its own version of echo, which usually supersedes the version described here. Please refer to your shell's documentation for details about the options it supports." Reading on, we see the AUTHORS: Brian Fox and Chet Ramey. Head on over to man bash
, where you will see that it has the same authors. In other words, the echo
man page we are reading is for bash
's echo
, and we are using dash
's echo
in our script (because we are using the dash
interpreter). Let's try bourne-cat-read-fixed.sh
again, this time using bash
instead of dash
:
$ echo 'abcdefghijklm\nop
123456789' | /bin/bash bourne-cat-read-fixed.sh
abcdefghijklm\nop
123456789
Using bash
, it works exactly as expected. If we had been using Arch Linux or Gentoo, we would not have even realized the differences between putting #!/bin/sh
and #!/bin/bash
at the top of our script, but there are differences, and it's important for us to remain aware of them.
bash
isn't always in /bin
The example above may tempt you to replace /bin/sh
with /bin/bash
and move on, since that works on most Linux systems. Unfortunately, BSD systems don't include bash
by default, so its installation path is /usr/local/bin/bash
(if it's installed at all). Fortunately, /usr/bin/env
handles this nicely:
#!/usr/bin/env bash
# Note line above. Many systems have bash in a location other than /bin
while read -r line; do
echo "$line"
done
The printf
way
Do we really have to pull in a bash
dependency just to get the echo
functionality we want? This is a case where echo
isn't really the right tool for the job. While it's conveniently concise for simple and interactive use, printf
is much more robust when we want to output data.
When using printf
, it's important that the string to be printed is not used as the FORMAT string, but instead as one of the ARGUMENT strings, interpolated with the %s
sequence:
#!/bin/sh
while read -r line; do
printf '%s\n' "$line"
done
The printf
output:
$ echo 'abcdefghijklm\nop
123456789' | ./bourne-cat-printf.sh
abcdefghijklm\nop
123456789
Conclusion
In summary, there are a few things to keep in mind when writing a shell script:
- Although
/bin/sh
is often linked to/bin/bash
, it is unsafe to assume that/bin/sh
is abash
interpreter. On Debian it's/bin/dash
, and many BSDs have a dedicated Bourne shell executable. - If you prefer using
#!/bin/sh
(there may be some speed advantages), you must ensure that your script is fully POSIX-compliant. For example, you might need to useprintf
instead ofecho
. - If you are writing and debugging your script using
bash
, be sure to markbash
as the interpreter or to ensure that it also runs correctly in plain oldsh
. - When you do want to use
bash
in a script that you plan to distribute, you should set the first line to be#!/usr/bin/env bash
so that it can be found anywhere on the user'sPATH
. It's not always at/bin/bash
. - Most of the time that you are using
read
, you really want to useread -r
. Without-r
,read
will interpret (and/or strip out) backslashes.
Happy scripting!