LJ Archive

Work the Shell

Subshells and Command-Line Scripting

Dave Taylor

Issue #219, July 2012

No games to hack this time; instead, I go back to basics and talk about how to build sophisticated shell commands directly on the command line, along with various ways to use subshells to increase your scripting efficiency.

I've been so busy the past few months writing scripts, I've rather wandered away from more rudimentary tutorial content. Let me try to address that this month by talking about something I find I do quite frequently: turn command-line invocations into short scripts, without ever actually saving them as separate files.

This methodology is consistent with how I create more complicated shell scripts too. I start by building up the key command interactively, then eventually do something like this:

$ !! > new-script.sh

to get what I've built up as the starting point of my shell script.

Renaming Files

Let's start with a simple example. I find that I commonly apply rename patterns to a set of files, often when it's something like a set of images denoted with the .JPEG suffix, but because I prefer lowercase, I'd like them changed to .jpg instead.

This is the perfect situation for a command-line for loop—something like:

for filename in *.JPEG
do
   commands
done

That'll easily match all the relevant files, and then I can rename them one by one.

Linux doesn't actually have a rename utility, however, so I'll need to use mv instead, which can be a bit confusing. The wrinkle is this: how do you take an existing filename and change it as desired? For that, I use a subshell:

newname=$(echo $filename | sed 's/.JPEG/.jpg/')

When I've talked in previous columns about how sed can be your friend and how it's a command well worth exploring, now you can see I wasn't just filling space. If I just wanted to fill space, I'd turn in a column that read “all work and no play makes Jack a dull boy”.

Now that the old name is “filename” and the new name is “newname”, all that's left is actually to do the rename. This is easily accomplished:

mv $filename $newname

There's a bit of a gotcha if you encounter a filename with a space in its name, however, so here's the entire script (with one useful line added so you can see what's going on), as I'd type in directly on the command line:

for filename in *.JPEG ; do
  newname="$(echo $filename | sed 's/.JPEG/.jpg/')"
  echo "Renaming $filename to $newname
  mv "$filename" "$newname"
done

If you haven't tried entering a multi-line command directly to the shell, you also might be surprised by how gracefully it handles it, as shown here:

$ for filename in *.JPEG
>

The > denotes that you're in the middle of command entry—handy. Just keep typing in lines until you're done, and as soon as it's a syntactically correct command block, the shell will execute it immediately, ending with its output and a new top-level prompt.

More Sophisticated Filename Selection

Let's say you want to do something similar, but instead of changing filenames, you want to change the spelling of someone's name within a subset of files. It turns out that Priscilla actually goes by “Pris”. Who knew?

There are a couple ways you can accomplish this task, including tapping the powerhouse find command with its -exec predicate, but because this is a shell scripting column, let's look at how to expand the for loop structure shown above.

The key difference is that in the “for name in pattern” sequence, you need to have pattern somehow reflect the result of a search of the contents of a set of files, not just the filenames. That's done with grep, but this time, you don't want to see the matching lines, you just want the names of the matching files. That's what the -l flag is for, as explained:

"-l     Only the names of files containing selected lines
                 are written to standard output."

Sounds right. Here's how that might look as a command:

$ grep -l "Priscilla" *.txt

The output would be a list of filenames.

How to get that into the for loop? You could use a temporary output file, but that's a lot of work. Instead, just as I invoked a subshell for the file rename (the “$( )” notation earlier), sometimes you'll also see subshells written with backticks: `cmd`. (Although I prefer $( ) notation myself.)

Putting it together:

for filename in $(grep -l "Priscilla" *.txt) ; do

Fixing Priscilla's name in the files can be another job for sed, although this time I would tap into a temporary filename and do a quick switch:

sed "s/Priscilla/Pris/g" "$filename" > $tempfile
mv "$tempfile" "$filename"
echo "Fixed Priscilla's name in $filename"

See how that works?

The classic gotcha in this situation is file permissions. An unexpected consequence of this rewrite is that the file not only has the pattern replaced, it also potentially gains a new owner and new default file permissions. If that's a potential problem, you'll need to grab the owner and current permissions before the mv command, then use chown and chmod to restore the file owner and permission, respectively.

Performance Issues

Theoretically, launching lots of subshells could have a performance hit as the Linux system has to do a lot more than just run individual commands as it invokes additional shells, passes variables and so on. In practice, however, I've found this sort of penalty to be negligible and think it's safe to ignore. If a subshell or two is the right way to proceed, just go for it.

That's not to say it's okay to be sloppy and write highly inefficient code. My mantra is that the more you're going to use the script, the smarter it is to spend the time to make it efficient and bomb-proof. That is, in the earlier scripts, I've ignored any tests for input validity, error conditions and meaningful output if there are no matches and so on.

Those can be added easily, along with a usage section so that a month later you remember exactly how the script works and what command flags you've added over time. For example, I have a 250-line script I've been building during the past year or two that lets me do lots of manipulation with HTML image tags. Type in just its name, and the output is prolific:

$ scale
Usage: scale {args} factor [file or files]
  -b      add 1px solid black border around image
  -c      add tags for a caption
  -C xx   use specified caption
  -f      use URL values for DaveOnFilm.com site
  -g      use URL values for GoFatherhood site
  -i      use URL values for intuitive.com/blog site
  -k KW   add keywords KW to the ALT tags
  -r      use 'align=right' instead of <center>
  -s      produces succinct dimensional tags only
  -w xx   warn if any images are more than the specified width
  factor  0.X for X% scaling or max width in pixels.
          A scaling factor of '1' produces 100%

Because I often go months without needing the more obscure features, it's extremely helpful and easily added to even the most simple of scripts.

Conclusion

I've spent the last year writing shell scripts that address various games. I hope you've found it useful for me to step back and talk about some basic shell scripting methodology. If so, let me know!

Dave Taylor has been hacking shell scripts for more than 30 years. Really. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

LJ Archive