Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

A different UNIX Part II: A better shell language

One of the things peo­ple study when they "learn unix" is shell script­ing and us­age. Be­cause ev­ery sys­tem has a shel­l, and if you learn to use it in­ter­ac­tive­ly, you are half way there to au­tomat­ing sys­tem tasks!

Let's con­sid­er that for a mo­men­t... what are the odds that the same lan­guage can be good for in­ter­ac­tive use and for pro­gram­ming? I say slim.

Not to men­tion that learn­ing shell as a way to learn unix is like go­ing to a school that teach­es TV pro­duc­tion, and study­ing the re­mote. While use­ful, not re­al­ly the im­por­tant tool (ok, that anal­o­gy does­n't work at al­l. But it sounds neat, does­n't it?).

The first thing is that to­day's Lin­ux dom­i­na­tion of the unix­sphere has caused a se­ri­ous mono­cul­ture in shell script­ing: ev­ery­one us­es bash. The more en­light­ened ones may check that their scripts work on some oth­er Bourne-style shel­l.

There are no im­por­tant dis­tri­bu­tions (or pro­pri­etary unix­es) that use a csh or any­thing like it. De­bian has a pol­i­cy that things should work with­out bashism­s. That's about as good as it get­s.

Writ­ing a dozen pages on how shell sucks would be triv­ial. But un­in­ter­est­ing.

So, let's think it over, and start from the top.

What should a shell script­ing lan­guage be like?

What does­n't mat­ter?

Let's tack­le these things. I in­vite any­one to add ex­tra ideas in the com­ments sec­tion.

What should a shell scripting language be like?

  • In­­ter­pret­ed (ob­vi­ous)

  • Dy­­nam­ic typ­ing (y­ou will be switch­ing ints to strs and vicev­er­sa all the time).

  • Easy in­­­cor­po­ra­­tion of oth­­er pro­­grams as func­­tion­s/meth­od­s/what­ev­er.

    That pret­­ty much is what makes it a shel­l. ls should be in­­dis­­t­in­guish­able from some­thing writ­ten us­ing the shell it­­self.

  • Pipes. This is a must. Unix has a bazil­lion tools meant to be used in com­­mand pipe­­lines. You can im­­ple­­ment a RDBMS us­ing that kind of thing (check out nosql). Lev­er­age that.

    But even here, on its strength, the shell is not per­fec­t. Why can't I eas­i­­ly pipe stderr and std­out to dif­fer­­ent pro­cess­es? Why can't I pipe the same thing to two pro­cess­es at the same time (yes, I know how to do it with a neat trick ;-)

  • Glob­bing. *.txt should give you a list of files. This is one of the ob­vi­ous things where sh is bro­ken. *.txt may be a string or a list, de­pend­ing on con­tex­t... and a list is just a se­ries of strings with blanks. That is one of the bazil­lion things that makes writ­ing shell scripts (at least good ones) hard:

    [ralsina@monty ralsina]\$ echo *out
    a.out
    [ralsina@monty ralsina]\$ echo *outa
    *outa
  • A list da­­ta type. No, writ­ing strings sep­a­rat­ed with spa­ces is not ok. Maybe a python-style dic­­tio­­nary as well?

  • Func­­tions (ob­vi­ous)

  • Li­braries (and ok, the shell source mech­a­nism seems good enough)

  • Stand­alone. It should­n't spawn sh for any rea­­son ;-)

What doesn't matter?

  • Per­­for­­mance. Ok, it mat­ters that a five-­lin­er does­n't take 50 min­utes un­­less it has to. But 1 sec­onds or two sec­ond­s? not that im­­por­­tan­t.

  • Ob­­ject ori­en­­ta­­tion. I don't see it be­ing too use­­ful. Shell scripts are old-­­fash­ioned :-)

  • Com­­pat­i­­bil­i­­ty to cur­rent shel­l­s. Come on. Why be like some­thing that suck­­s? ;-)

Now, the example

Let's con­sid­er a typ­i­cal piece of shell script and a re­write in a more rea­son­able syn­tax.

This is bash (no it does­n't work on any oth­er shel­l, I think):

DAEMONS=( syslog network cron )

# Start daemons
for daemon in "\${DAEMONS[@]}"; do
      if [ "\$daemon" = "\${daemon#!}" ]; then
              if [ "\$daemon" = "\${daemon#@}" ]; then
                      /etc/rc.d/\$daemon start
              else
                      stat_bkgd "Starting \${daemon:1}"
                      (/etc/rc.d/\${daemon:1} start) &>/dev/null &
              fi
      fi
done

And since DAE­MONS is some­thing the ad­min writes, this script lets you shoot in the foot in half a dozen ways, too.

How about this:

DAEMONS=["syslog","network","cron"]

# Start daemons
for daemon in DAEMONS {
      if ( daemon[0] != "!" ) {
              if ( daemon[0] == "@" ) {
                      stat_bkgd ("Starting "+daemon[1:])
                      /etc/rc.d/+daemon[1:] ("start") &> /dev/null &
              } else {
                      /etc/rc.d/+daemon ("start")
              }
      }
}

Of couse the syn­tax is some­thing I just made up as I was writ­ing, but is­n't it nicer al­ready?

Michal / 2006-10-05 18:24:

Your syntax is actually similar to YCP, a scripting language used by SUSE YAST. The language is however not used anywhere else.

Roberto Alsina / 2006-10-05 18:52:

Didn't knew about it.

And here's a classical program, in YCP:

http://www.99-bottles-of-be...

Henry Miller / 2006-10-05 23:05:

This is the wrong approach.

As a dedicated csh user (mostly because when I first started in unix we didn't have so many good choices, it was sh or csh, or find room in your tiny home directory to compile your own - and csh was default), I can state with confidence that ALL shell scripts should be written in bourne. Use your favorite shell for interactive use. Switch to your hearts content. You can even write some person scripts for that shell if you feel like it.

HOWEVER IF IT IS A PUBLIC SHELL SCRIPT IT MUST BE BOURNE! There is no other choice.

If you do not wish to use bourne, then write in python (ruby, tcl, perl, scheme, add a dozen more programing languages and pick your favorite). Beware though that you have just created a requrement that users need to install the language of your choice.

We do not write in bourne because we like it. We write in it because it is a least common denomator that EVERYONE has. Therefore when you write in bourne you are writting something easy for everyone, when you write in anything else you are making it easy for yourself at the expense of forcing your users to install what you want. (This isn't a bad thing, if the script is at all complex you should use something better, but for 100 line scripts - which covers many scripts - bourne works good enough and you avoid the problems of installing other langauges)

Face it, bourne shell will always be with us as the least common enomator. There are many better langauges out there, but none are universial. (I don't have bash on my personal freebsd machine)

skierpage / 2006-10-06 00:41:

Agreed, stick with Bourne for distributed scripts.

Having first-class arrays instead of white-space separated strings is essential; I ran into so many script bugs with Windows and Dreamweaver users accidentally creating "Copy of webpage.html" files on a Web server that I wrote a script just to remove all files with spaces in them (which itself is really hard to get right with xargs quoting).

But the problem just resurfaces when you try to parse program output. ls -l and find dump the spaces in file names, so you're back to trying to figure out where elements in strings begin and end in order to turn them into arrays. At that point it's easier to find a Perl library that does what the command-line tool does but returns hashes.

Microsoft have an interesting idea in Monad (now "PowerShell") outputting objects with keys, so you can unambiguously access "Name" from directory listing output without having to guess word boundaries. You can magically pipe the objects output from one command into another (you can tell I've never used it :-) ). But that's a whole lot of utilities to rewrite.

Roberto Alsina / 2006-10-06 00:43:

No, bourne is not good enough for 100 line scripts. Bourne is not good enough for many 10-line scripts.

Bourne bites you in the ass when you least expect it :-(

This post was pretty much a thought experiment, and a motivation to write a parser (which is something I never did, and want to learn about).

Most traditional languages suck when you try to use them as shell script replacements, because they are not taylored for that use.

Shell is the grandaddy of all domain specific languages, and there is no reason why, say, a distro can't bring another interpreter to the table, and write its own scripts using it.

Axel Liljencrantz / 2006-10-06 13:05:

Hi. Nice article, some interesting ideas. A lot of the features you want exist in a commandline shell called fish, available at http://www.fishshell.org.

As in your article, globbings with no matches do not expand to the original argument.

As in your article, spaces are not used to create extremely fragile 'poor mans arrays'.

In fish, all variables are actually arrays of strings. So the integer data type you wanted does not exist, but the list datatype is really nice. You can use negative indexing to index from the back. You can specify multiple indexes inside a single set of brackets to perform slicing, for example, the first three elements of path are '$PATH[(seq 3)]'. The 'set' command, which is used for all types of variable assignment, allows you to assign to or remove slices of arrays. A cool subcase is that you can treat command substitutions as an array as well. So if you for example want to write the fourth and fift line of the file foo.txt to standard out, simply use 'echo (cat foo.txt)[4 5]'.

Some important unique things fish has that you don't mention:

No implicit subshells. In other shells, if you write a function and use it inside a pipeline, that function will silently be executed in another process. That means that if you function alters a global variable, for example, you simply _can't_ use it in a pipeline. Fish never ever forks of subshells, not even command substitutions.

Universal variables: Universal variables are variables whose value are shared between all the users shells, and the value is preserved across reboots and logouts. This is extremely handy for configuration options, for example. Just change a value, and it will be updated at once, for all shells, and in the future too.

Sane scoping rules. In fish, when you create a new variable inside a function, it will by default be a local variable. The syntax for specifying variable scope in fish is simple, use the switches -U/--universal, -g/--global or -l/--local when using the set builtin to specify the scope you want the variable to live in. Local variables have precedence over global ones, and global ones have precedence over universal ones.

Code validation. The fish syntax is designed so that it can be validated before executing whenever possible. That means a lot more syntax errors are caught early.

Autoloaded functions. Fish has something much nicer than the source command for writing libraries. You can give fish a path list for directories containg definitions of functions, one function in each file, and the file is named after the function. Whenever you need a specific function, fish will automatically load the file, and if the file changes, it will be reloaded. This is needed by fish itself, since fish contains many thousand lines of shellscript, so fish would be a memory hog if it didn't autoload functions. But it also means that you can safely import a large number of libraries without worrying about slow startup or m

Roberto Alsina / 2006-10-06 14:20:

Great stuff about fish. Will have to investigate it :-)

Axel Liljencrantz / 2006-10-06 15:05:

Hi, again. I just noticed my comment got truncated. Sorry about writing so long a post.

The remainder of my post consisted of some more features of fish and an invitation to help in the development.

If you want to make a better shell, I hope you'll join the fish effort by subscribing to the fish mailing list at https://lists.sourceforge.n....

I'll be happy do discuss the design decisions that have been made in fish as well as how fish could be further improved.

Roberto Alsina / 2006-10-06 16:04:

I have been readin the ars technica article, and I really like it.

However, I have a hidden agenda here, which is learning to write parsers.

I don't expect fish is ready to be turned into a python-interpreted language, which is about all I have figured out so far :-)

Thanks for the comments, and while fish may not have won a developer (besides, I am not very good at that ;-) it sure has won a fan.

Kevin / 2006-10-06 19:13:

I second the approach that Microsoft is taking with PowerShell. There are some really nice things in there: passing objects via pipes rather than text as was already mentioned, standard command line argument parsing (yay!), interop with .NET. All good things. It looks a little funky for interactive use, but excellent as scripting glue.

Axel Liljencrantz / 2006-10-07 22:13:

Roberto, if you want to write a parser and make the commandline a better place, and do it in Python, I have _just_ the project for you.

Create a tool that parses man pages and produces command completions from them. You would need a parser, it would be very useful on the commandline, you can do it in a high level language, and there is even a preexisting software called doclifter, written in python, which converts man pages to docbook format.


Contents © 2000-2024 Roberto Alsina