Work the Shell

When Is a Script Not a Script?

Dave Taylor

Issue #254, June 2015

Dave receives a half-written script from a reader and realizes it's easily replaced with find—or is it? The problem might be more subtle than it first appears.

I received a very interesting script from reader Jeremy Stent via e-mail, and our subsequent conversation is something other script writers should consider too.

First off, here's the script he sent in:

function recurse_dir()

{
        for f in * ; do
          if [ -d "${f}" ] ; then
            pushd "${f}" 
            recurse_dir 
            popd
          fi
        done
}

pushd ~/dir 
recurse_dir 
popd

It's an interesting little script, and in case you aren't sure what's going on, it basically is recursively stepping through a directory tree. It's not actually doing anything, not even pushing any output, just recursing.

Of course, it'd be easy to add output or commands, but I was a bit baffled about the script's purpose when I received it. It's also hard to understand why there are so many pushd/popd invocations as well.

The original e-mail message actually was about how to deal with tricky filenames that contain spaces or punctuation, but that's usually just managed by ensuring that every time you reference a filename, you include quotes. Doing so breaks the “for” statement, however, as is easily understood if you think about the fact that Bash uses white space (space, tab) as the field separator (aka “FS”).

So if the current directory contains a file called “hello world”, the “for” loop will offer up values of the “f” variable “hello”, then “world”, both of which are invalid filenames. This is one of the many reasons Linux is really clumsy with modern filenames, whether they contain punctuation or white space.

Still, here's how I responded to the query e-mail:

That's an interesting script you're trying to build. I'm not clear why you're using push/pop as you traverse the directories too. Why not just have cd ${f} followed by cd .. to get back up a level and simplify things?
In terms of difficult filenames, yeah, Linux wasn't really written to deal with filenames that start with a dash, have a space or other punctuation. The best you can do is experiment to see if the commands you're using accept -- as a way to delineate that you're done with command arguments, and quote the directory names themselves, as you've done.

Where the entire dialog got interesting was with his response, when he finally explained what he was trying to do: “My end intent is to remove the execute bit from files in a directory tree. Running rsync from a Windows box sometimes sets execute on files that I do not want. I do not want to remove the execute bit from directories, so I write a script like this.”

Ah, I realized what he was trying to do, and the answer is actually quite straightforward: use find, not a shell script.

In fact, the find command is more than capable of traversing a filesystem, identifying non-directory files and changing their permissions to remove an execute bit that's presumably erroneously set.

(I say “presumably erroneously set”, because there are actually a number of situations where a non-directory should retain its execute permission, including any shell, Perl or Ruby script and any compiled program, whether written in C++, Pascal or Fortran. In fact, blindly removing execute permission is problematic across any large piece of the Linux filesystem.)

On the assumption that the writer does want to proceed by removing the executable permission on files in a subsystem of the file tree, it's easily done with:

find . -type f -exec chmod -x {} ;

To understand that, start with the benign alternative:

find . -type f -exec echo {} ;

This simple invocation of find will give you a quick list of every non-directory file in the current directory and any subdirectory below.

If you do dig in to the find man page, don't be misled by one of the other predicates: -perm lets you test permissions, not change them. So if you wanted to limit things to only those files that were executable, -perm +x would make sense.

Sidetracks, We Have Sidetracks

This problem of trying to debug a complex shell script when a simple Linux command invocation will do the trick is not uncommon, and it's one of the challenges for all developers. Unless you're in a shell programming class, the goal is what should dictate the solution path, not the tools. In other words, just because you happen to have a desire to learn more about shell script programming, doesn't mean that it's always the best and smartest solution for a given challenge.

And in terms of the original script, here's an interesting variation: what if you used find to generate a list of all files, then probed to see if you could ascertain whether a given file is associated with program source code (for example, it finds “hello” and then tests to see if “hello.c” exists) or if it's a shell script (information obtainable through the file command)?

Here's my first stab at this:

for filename in $(find . -type f -print) ; do
 if [ -x $filename ] ; then
    echo "File $filename is executable:"
    if [ ! -z "$(file $filename | grep "shell script")" ] ; then
      echo "  It's okay, it appears to be a shell script."
    elif [ -f "${filename}.c" -o -f "${filename}.cxx" ] ; then
      echo "  It's okay, there's a corresponding source file."
    else
      echo "  >> might be erroneously marked executable."
    fi
  fi
done

You can see that I'm using the find command to generate a list of every file from the current spot in the filesystem downward, so if there are lots of directories, this might generate quite a list. If there are too many, the shell can complain that it has run out of buffer, but that's a problem I'll sidestep in the interest of simplicity.

To test whether the executable file is a shell script, there are two basic output formats for the file command, as demonstrated here:

test.sh: POSIX shell script, ASCII text executable
test.sh: POSIX shell script text executable

In either case, a simple test for “shell script” does the trick, as you can see if you closely examine the second conditional statement.

To see if the executable file is associated with a C or C++ source file, those are tested against the “.c” and “.cxx” filename suffixes in the elif statement. Keep in mind that -o is a logical OR, so the test is literally “if the .c file exists OR the .cxx file exists”.

A quick run of this script produces this output:

$ sh test.sh
File ./taylor-trust.pdf is executable:
  >> might be erroneously marked executable.
File ./hello is executable:
  It's okay, there's a corresponding source file.
File ./plus is executable:
  It's okay, there's a corresponding source file.
File ./test.sh is executable:
  It's okay, it appears to be a shell script.

You can see that the script has recognized correctly that test.sh is a shell script (the last file tested), that “hello” and “plus” are both associated with source files (one a C program and the other a C++ program), but that the file taylor-trust.pdf is probably erroneously marked as executable.

In fact, PDF files shouldn't be executable, so the output is exactly as desired. It's a simple matter to add a chmod -x where the error message about erroneous executable files is located in the script source.

By focusing too closely on the script, you could have spent a lot of time debugging something unneeded. That initial problem was solved more easily with a single invocation to find. Thinking about it more, however, it's clear that a more sophisticated algorithm is required to ensure that getting rid of the execute permission could be a problem, so a more sophisticated set of tests is required—and easily solved.