Bourne is not bash (or: read, echo, and backslash)
Bourne is not bash
Anyone who has ever written a shell script is familiar with the incantation, #!/bin/sh. This tells a Unix loader that the file should be interpreted by a Bourne shell. The Bourne shell is an appropriate lightweight interpreter for simple scripts that invoke a couple other programs. When you are using it, though, it's important to remember that bash and Bourne are subtly different shells.
The conflation of the two shells is exacerbated further by many Linux distributions having /bin/sh simply be a symlink to /bin/bash. Debian-based distributions, including Ubuntu, depart from this practice, instead having /bin/sh linked to /bin/dash, a faster, simpler, and POSIX-compliant version of the Bourne shell. In the process, of course, many bash conveniences are lost, and unexpected problems can arise if you are expecting bash behavior.
A catlike example
Let me give a simple illustration of the problem. Let's implement a Bourne version of cat, bourne-cat.sh:
#!/bin/sh
while read line; do
echo "$line"
done
If you've never used while read ... in a shell script, it's a very useful command. If you run man builtins, you'll learn that read name... reads a line from standard input, assigning each (whitespace-delimited) word to each variable name. When the last name is reached, the rest of the line is placed into that variable.
In my example above, the loop iterates for each line of standard input, assigning the text to the line variable. The echo command then prints that line out. Simple enough, right?
Problems with backslash
Let's try out our new script:
$ echo 'abcdefghijklmnop
123456789' | /bin/dash bourne-cat.sh
abcdefghijklmnop
123456789
So, exactly what we expect. Now let's break it:
$ echo 'abcdefghijklm\nop
123456789'
abcdefghijklm\nop
123456789
$ echo 'abcdefghijklm\nop
123456789' | /bin/dash bourne-cat.sh
abcdefghijklmnop
123456789
It swallowed the backslash--definitely not what we wanted. So what's going on?
read interprets backslash
Referring back to the builtins man page, "The backslash character (\) may be used to remove any special meaning for the next character read and for line continuations." That is, read considers backslash to be a special character, so the line variable does not contain the backslash. If we do not want that behavior, we have to use read -r. Here's bourne-cat-read-fixed.sh:
#!/bin/sh
while read -r line; do
echo "$line"
done
Now our problem should be fixed:
$ echo 'abcdefghijklm\nop
123456789' | /bin/dash bourne-cat-read-fixed.sh
abcdefghijklm
op
123456789
Hmmm... So what happened there?
Bourne's echo interprets backslash
Since we're reasonably sure that read is now behaving nicely, there must be a problem with echo. Reading the man page, we see that echo accepts a -E option that disables the interpretation of backslash escapes. According to the man page, however, this is the default behavior. Oh well, let's throw the option in there to see if it makes a difference in bourne-cat-E.sh:
#!/bin/sh
while read -r line; do
echo -E "$line"
done
Running it:
$ echo 'abcdefghijklm\nop
123456789' | /bin/dash bourne-cat-E.sh
-E abcdefghijklm
op
-E 123456789
Seriously? Our echo command is ignoring the -E option entirely, happily outputting both it and the interpreted newline. This is getting a bit nutty.
When echo(1) isn't echo
Returning to the echo man page, you will see an important caveat near the bottom, saying "NOTE: your shell may have its own version of echo, which usually supersedes the version described here. Please refer to your shell's documentation for details about the options it supports." Reading on, we see the AUTHORS: Brian Fox and Chet Ramey. Head on over to man bash, where you will see that it has the same authors. In other words, the echo man page we are reading is for bash's echo, and we are using dash's echo in our script (because we are using the dash interpreter). Let's try bourne-cat-read-fixed.sh again, this time using bash instead of dash:
$ echo 'abcdefghijklm\nop
123456789' | /bin/bash bourne-cat-read-fixed.sh
abcdefghijklm\nop
123456789
Using bash, it works exactly as expected. If we had been using Arch Linux or Gentoo, we would not have even realized the differences between putting #!/bin/sh and #!/bin/bash at the top of our script, but there are differences, and it's important for us to remain aware of them.
bash isn't always in /bin
The example above may tempt you to replace /bin/sh with /bin/bash and move on, since that works on most Linux systems. Unfortunately, BSD systems don't include bash by default, so its installation path is /usr/local/bin/bash (if it's installed at all). Fortunately, /usr/bin/env handles this nicely:
#!/usr/bin/env bash
# Note line above. Many systems have bash in a location other than /bin
while read -r line; do
echo "$line"
done
The printf way
Do we really have to pull in a bash dependency just to get the echo functionality we want? This is a case where echo isn't really the right tool for the job. While it's conveniently concise for simple and interactive use, printf is much more robust when we want to output data.
When using printf, it's important that the string to be printed is not used as the FORMAT string, but instead as one of the ARGUMENT strings, interpolated with the %s sequence:
#!/bin/sh
while read -r line; do
printf '%s\n' "$line"
done
The printf output:
$ echo 'abcdefghijklm\nop
123456789' | ./bourne-cat-printf.sh
abcdefghijklm\nop
123456789
Conclusion
In summary, there are a few things to keep in mind when writing a shell script:
- Although
/bin/shis often linked to/bin/bash, it is unsafe to assume that/bin/shis abashinterpreter. On Debian it's/bin/dash, and many BSDs have a dedicated Bourne shell executable. - If you prefer using
#!/bin/sh(there may be some speed advantages), you must ensure that your script is fully POSIX-compliant. For example, you might need to useprintfinstead ofecho. - If you are writing and debugging your script using
bash, be sure to markbashas the interpreter or to ensure that it also runs correctly in plain oldsh. - When you do want to use
bashin a script that you plan to distribute, you should set the first line to be#!/usr/bin/env bashso that it can be found anywhere on the user'sPATH. It's not always at/bin/bash. - Most of the time that you are using
read, you really want to useread -r. Without-r,readwill interpret (and/or strip out) backslashes.
Happy scripting!