COLODEBUG: a simple way to improve bash script debugging

In this article, I will show you an easy-to-use, simple, and non-disruptive way to extend a GNU bash script by a few lines that can help make sense of its execution flow at runtime a fair bit easier. Using this method will also allow you to effortlessly add a “verbose” execution mode to scripts you create or extend. It also improves the usefulness of set -x aka xtrace mode. The method is compatible with zsh, and maybe also other advanced Bourne-style shells.

POSIX sh will, unfortunately, only work in a limited manner, as implementing COLODEBUG support involves using/redefining a function name that at least the implementations I checked did not support. Basic colon comments in xtrace mode, however, should work under full POSIX compliance, too.

If you do not feel like reading copious amounts of prose where I get to try to be funny, you may want to skip to the tl;dr-style section at the very end of this article to get to the gist of it all.

Debugging shell scripts - a few common approaches

Usually, when something in your unfortunately-by-now-much-too-elaborate bash script goes wrong, there’s a few tricks to ease the burden of debugging it: Sprinkling a few echo (or, better yet, printf) calls all over, making it sleep, read or exit at neuralgic points, or enabling xtrace mode (via set -x) to get glimpses at what the interpreter is doing piling up on stderr. The latter is often especially helpful, but has one characteristic that makes it harder than necessary to grok what is going on at times, especially with long scripts that you did not write yourself (or that you do not have any recollection of writing): It simply hides some tricky business, such as common redirection (<, >, etc.) operations, and it does not reproduce any comments the interpreter encounters in the script’s source.

Source code comments and how they (not) help us

I suppose software developers do not universally agree on many things, but most will like helpful comments in code they have to understand much better than their absence. I always lamented it as a peculiar kind of shame that, when dealing with Bourne-style shell scripts, one is limited to enjoying these “at rest” only, preferably displayed by your favorite standard editor, ed(1). So how about we try tricking the shell into also providing us with comment-infused context information at runtime, at least in the more tricky corners of the contraptions we and our fellow shell artisans dare create?

Introducing: :

Luckily, our wise forebears that graced us with UNIX, POSIX, and the Bourne-compatible shell, granted us to enjoy one particular powerhouse of a command that is :. Consulting bash’s excellent online help system, we can quickly grasp wherein lie its many virtues:

$ help :
:: :
    Null command.

    No effect; the command does nothing.

    Exit Status:
    Always succeeds.

Doing nothing, yet always succeeding - it reads like every IT practitioners’ dream finally come true. But how can something that does nothing… help us get something done? Let’s take a look at a short example code excerpt from an entirely fictional piece of shell opus magnum to clear this up:

# try to find potential SSDs in my GNU/Linux system and collect them in an array
for f in /dev/sd* /dev/nvme*
declare -a ssds
do
  # check if $f really is a block device
  [[ -b $f ]] && ssds+=("$f")
done
# write a list of all identified devices into /tmp/ssd_list_$$
printf '%s\n' "${ssds[@]}" > /tmp/ssd_list_$$

Nothing breathtaking here - we’re looping over a number of pathnames that fell out of a wildcard expansion, invoking some minor black bash magic that interacts with stat(2), and push the pathname we checked onto an array. Running this thru bash -x yields (approximately, ymmv) this:

+ declare -a ssds
+ for f in /dev/sd* /dev/nvme*
+ [[ -b /dev/sd* ]]
+ for f in /dev/sd* /dev/nvme*
+ [[ -b /dev/nvme0 ]]
+ for f in /dev/sd* /dev/nvme*
+ [[ -b /dev/nvme0n1 ]]
+ ssds+=("$f")
+ for f in /dev/sd* /dev/nvme*
+ [[ -b /dev/nvme0n1p1 ]]
+ ssds+=("$f")
+ printf '%s\n' /dev/nvme0n1 /dev/nvme0n1p1

Now, let’s look at a variant of the script with “number sign” (aka hash) comments substituted with what I hereby christen “colon comments”, also ran thru bash -x. We get this instead:

+ : try to find potential SSDs in my GNU/Linux system and collect them in an array
+ declare -a ssds
+ for f in /dev/sd* /dev/nvme*
+ : check if '/dev/sd*' really is a block device
+ [[ -b /dev/sd* ]]
+ for f in /dev/sd* /dev/nvme*
+ : check if /dev/nvme0 really is a block device
+ [[ -b /dev/nvme0 ]]
+ for f in /dev/sd* /dev/nvme*
+ : check if /dev/nvme0n1 really is a block device
+ [[ -b /dev/nvme0n1 ]]
+ ssds+=("$f")
+ for f in /dev/sd* /dev/nvme*
+ : check if /dev/nvme0n1p1 really is a block device
+ [[ -b /dev/nvme0n1p1 ]]
+ ssds+=("$f")
+ : write a list of all identified devices into /tmp/ssd_list_276355
+ printf '%s\n' /dev/nvme0n1 /dev/nvme0n1p1

Ah. Much better, in a way - or at least it would be, whenever it’s not already painfully obvious what is going on, and/or where in our hypothetical grand scheme of a script, suffering from mild longishness and occasional DRY-violations, all this stuff actually happens. Note that, due to colon comment in the penultimate line, we also get to know what the output file path for the stdout redirection was.

A word of warning though, because # and : are not at all 100% interchangeable in your shell language’s grammar: The hash character works mid-line, or mid-command, to absolve characters of their duty as serving as input to a procution rule. The colon character, however, does not - it’s the start of an ordinary command, much like having your shell run ls, cp, or any other merry little tool, albeit with fewer side effects. To conjure up a contrived example illustrating this difference, you would not be able to treat a shell script line like this: ls "$OLDPWD" # list the prev. cwd the same way I did treat the lines in the example script above. But writing it like this would do the deed: ls "$OLDPWD"; : list the prev. cwd I think this postfix, if you will, notation style of a shell script comment is A-OK with #, but would likely confuse casual shell script connoisseurs, and could also have unwanted side effects such as potentially destroying the value of ${?} (your shell’s errno) in the next command. Also, when using :, syntax errors after the start of your quasi-command can actually break your script, while # immunizes you against this kind of mistake. So if you use : for commenting purposes, make sure to use it with proper care.

If you see the merits of this approach, you may already add that little trick to your personal collection, and simply… bash on, happily ever after. Oh, and use this on any Bourne-style shell you will encounter - there are no bashisms at play here, yet.

Or you can choose to …

Meet : :: - aka “: on (mild) steroids”

So, we know what : does, and we can probably come up with an implementation of our own. Which means that we can also come up with a better version of the function (actually, we are dealing with a shell builtin here - but we’re going to replace it with a custom function) on our own. Let’s begin by defining a clone of : written in beautiful POSIX sh:

:() {
  return 0
}

What might look dangerously close to a fork bomb in the beginning, ends up as a harmless dud that does what : does best: nothing. Of course, actually defining our version of : like this makes no sense - yet. We need an additional escape hatch of sorts, so that the innocent nothing-generator we just produced might deviate from its builtin exemplar, given a certain condition:

:() {
  [[ ${1:--} != ::* ]] && return 0
  printf '%s\n' "${*}" >&2
}

Now what does that do? We modify our homebrew implementation of : to evaluate its first argument (if any - if there’s none, we substitute a dash for its absent value due to reasons), and, in case that happens to not be a sequence of bytes starting with two ASCII colon characters, exit with success. If the argument however is a two-colon sequence, followed by anything else, we print a concatenation of all arguments that : got passed in to stderr, for communicating with our observer.

What are we doing that for? To be able to extend our effortless debugging improvement from above (that extended xtrace mode with hopefully valuable context information) to also give us a kind of “verbose mode” for nearly-free: Just extend your magic colon comments with : :: important context information, also : ::::: like so, or, as a kind of : ::NOTICE::, like so, and reap what you sow. Note that you do not need to have xtrace enabled for this extra feature to work.

Fixing xtrace, implementing opt-in

But we’re not quite done yet, because having our “complex” replacement for the vanilla : at all times would needlessly garble xtrace mode with useless argument checking output. And we still really like xtrace mode. Because with it, everything might be a bit crowded on stderr, and some contextually useful information might be missing (unless you heroically dare to use colon comments, of course!) - but for it to do its job, we do not have to rely on the script author’s cooperation to allow us to see in glorious (or, at your option, gory) detail what is going on.

Furthermore, we would like our “verbose” message mode to be opt in. UNIX philosophy considers it impolite to strain the user with too much output - at least in absence of errors - and I think that’s a very good tradition to keep up in this world of myriads of readily available distractions everywhere.

So, to achieve both, we wrap our very definition (ah, the joys of dynamic languages!) of our : function in a check for two more things:

if [[ -n ${COLODEBUG} && ${-} != *x* ]]
then
:() {
  [[ ${1:--} != ::* ]] && return 0
  printf '%s\n' "${*}" >&2
}
fi

This requires the stars to align in a particular manner for our variant of : to come into effect at all: The shell needs to have a defined variable named COLODEBUG (sic!) in its environment (i.e., you could run export COLODEBUG=1 before invoking a script armed like that - the particular value you assign to it doesn’t even matter in this practical exercise of postmodernism). And the shell must not currently operate in xtrace mode, which GNU bash and other Bourne-like shells reflect by having an x character in the special internal shell variable named - (no, I am not making this name up).

So, with the above block in our bash script, preferably right at the beginning, we can selectively control “verbose debug message mode” that will turn our magic colon comments into informal diagnostic messages. Or, we can enable good ol’ xtrace mode, and have that augmented with some useful context information by means of “vanilla” colon comments. We get this at the cost of a really tiny little bit of runtime overhead, as actively NOPing is a teensy bit more wasteful than simply discarding a line(-affix) after parsing, but hey: We’re writing a shell script. Relax. Optimal performance is not your first concern.

To give you an impression of how this makes our little example script from above tick, behold this transcript of a real bash session (with some whitespace added to improve clarity):

bash-5 $ # our example script in all its glory
bash-5 $ nl debugdemo.sh
     1  #!/bin/bash

     2  if [[ -n ${COLODEBUG} && ${-} != *x* ]]
     3  then
     4  :() {
     5    [[ ${1:--} != ::* ]] && return 0
     6    printf '%s\n' "${*}" >&2
     7  }
     8  fi


     9  : try to find potential SSDs in my GNU/Linux system and collect them in an array
    10  declare -a ssds
    11  for f in /dev/sd* /dev/nvme*
    12  do
    13    : ::: check if $f really is a block device
    14    [[ -b $f ]] && ssds+=("$f")
    15  done

    16  : ::: write a list of all identified devices into /tmp/ssd_list_$$
    17  printf '%s\n' "${ssds[@]}" > /tmp/ssd_list_$$

bash-5 $ # running it normally, with no trace of what's going on
bash-5 $ ./debugdemo.sh

bash-5 $ # using the COLODEBUG env var to turn on our own : for verbose magic colon comments
bash-5 $ COLODEBUG=yesplease ./debugdemo.sh
::: check if /dev/sd* really is a block device
::: check if /dev/nvme0 really is a block device
::: check if /dev/nvme0n1 really is a block device
::: check if /dev/nvme0n1p1 really is a block device
::: write a list of all identified devices into /tmp/ssd_list_279503

bash-5 $ # finally, using xtrace as we got used to with vanilla : comment goodness
bash-5 $ bash -x debugdemo.sh
+ [[ -n '' ]]
+ : try to find potential SSDs in my GNU/Linux system and collect them in an array
+ declare -a ssds
+ for f in /dev/sd* /dev/nvme*
+ : ::: check if '/dev/sd*' really is a block device
+ [[ -b /dev/sd* ]]
+ for f in /dev/sd* /dev/nvme*
+ : ::: check if /dev/nvme0 really is a block device
+ [[ -b /dev/nvme0 ]]
+ for f in /dev/sd* /dev/nvme*
+ : ::: check if /dev/nvme0n1 really is a block device
+ [[ -b /dev/nvme0n1 ]]
+ ssds+=("$f")
+ for f in /dev/sd* /dev/nvme*
+ : ::: check if /dev/nvme0n1p1 really is a block device
+ [[ -b /dev/nvme0n1p1 ]]
+ ssds+=("$f")
+ : ::: write a list of all identified devices into /tmp/ssd_list_279508
+ printf '%s\n' /dev/nvme0n1 /dev/nvme0n1p1

If you feel particularly daring and aren’t satisfied with this result yet, you could pack all kinds of neat stuff into your personal : function, and add them to the output - internal bash variables like ${BASH_LINENO} and friends come to mind. Just try not to create the unholy love child of log4shell and shellshock while you are at it, mmkay?

Recap and tl;dr

Put this right at the start of your bash script:

if [[ -n ${COLODEBUG} && ${-} != *x* ]]
then
:() {
  [[ ${1:--} != ::* ]] && return 0
  printf '%s\n' "${*}" >&2
}
fi

Next, replace whole-line hash-style comments (# This is an example comment) with “colon comments” (: This is a colon comment). If you also want your script to support a cheap “verbose” mode that re-uses source code comments, pass a word that starts with :: in first right after :, : :::: like this one, for example.

Now, observe the effects of your changes by having bash interpret your script with the -x flag active (bash -x yourbashscript.bash), or have a variable named COLODEBUG (sic!) with any value assigned (e.g., COLODEBUG=1 bash yourbashscript.bash) in its environment.

Enjoy! :)

Copyright ©2021 Johannes Truschnigg

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.