The Fantastic SUBST Function (part 1)

When John McCarthy in 1960 wrote his famous paper on the programming language LISP, he used a particular function to illustrate what you could do with the language. The language consisted of only nine “special forms”, primitive building blocks by which any computable function could be created. The function was called SUBST, and it included all nine of the special forms. It was a remarkable function. It could rip through any tree and replace all leaves that had one value with another value. And it consisted of a single list (here broken for space reasons):

(LABEL, SUBST, (LAMBDA, (X, Y, Z), (COND ((ATOM, Z), (COND, ((EQ, Y, Z), X),
  ((QUOTE, T), Z))) ((QUOTE, T), (CONS, (SUBST, X, Y, (CAR Z)),
  (SUBST, X, Y, (CDR, Z)))))))

We can lowercase, remove the commas, and format it to make it a little more readable:

(label subst
  (lambda (x y z)
    (cond ((atom? z) (cond ((eq? z y) x) ((quote t) z)))
          ((quote t) (cons (subst x y (car z))
                           (subst x y (cdr z)))))))

In order to understand this function, we need to understand the nine building blocks. We will go through them in this order:

  • quote
  • cond
  • atom?
  • eq?
  • cons
  • car
  • cdr
  • lambda
  • label

But before we dig into those, we must understand what the parenthesis are for. There is a hint in the name of the language: LISP. It stands for “List Processing”. It means that the program consists of lists of things, and that these lists are processed according to some rules. So what is a list? A list is some elements surrounded by parenthesis. Take this list of the plus symbol, the number one, the number two, and the number three:

(+ 1 2 3)
=> 6

I’ll use the little arrow, =>, to indicate what the evaluator prints out. When processing this, LISP will treat the first element of the list as something to call, while the remaining elements will be the arguments to that call. The above means “resolve the symbol + to the function plus, then call the function plus with the numbers 1, 2, and 3″. This will evaluate to 6.

Lists can contain other lists:

(* 2 (+ 6 5))
=> 22

Functions that call other functions. That’s pretty much all the semantics we need in order to write programs in LISP.

quote

LISP has a literal syntax for lists, ie elements surrounded by parenthesis, and a LISP program is written as one or more such lists. So, the programs are written in the actual data structures of the language. Makes your head spin when you first hear about it. This is called being “homoiconic” and this is what distinguishes LISP from most other programming languages. It has some astonishing consequences, but we’ll save those for later. However, being homoiconic does mean that if you want to pass an actual list of things as an argument to a function, you need a way of telling LISP not to treat it as a function call, ie not to try to call the first element in it. For example, this first attempt to count the elements in a list will fail because the number 1 can’t be called:

(count (1 2 3))
=> Error: `1´ is not a function

What do you do if I tell you to say your name? After the initial surprise, you’ll probably tell me what your name is. But if I tell you to say “your name”, with the double quotes, you would of course repeat “your name”, instead of actually telling me what you’re called. The same goes for LISP. If you quote a list, LISP will not try to evaluate it. In this case, it will send the unevaluated list to the function count:

(count (quote (1 2 3)))
=> 3

Using the quote form everywhere we need an actual list will quickly become cumbersome. So there exists a little macro called ' (a single quote) that translates what follows it into a call to quote. This is equivalent, and much easier on the eye:

(count '(1 2 3))
=> 3

cond

At the time, not only the homoiconicity of LISP was new, another thing was new as well: the conditional. There was already a primitive, sort-of-conditional in Fortran, where you could jump to various locations depending on whether a number was less than, greater than, or equal to zero. But that was nothing compared to the elegance of the COND. Consider the following:

(cond ((some logic) (some consequence))
      ((another logic) (another consequence))
      ('t (the fallback)))

Here I have simulated conditional expressions as calls to the fake function some with the argument logic and a call to another with the argument logic. The COND will check in turn each conditional expression. If it evaluates to true, the corresponding consequence expression will be evaluated and its result returned from the COND. If the conditional expression evaluates to false, the next conditional/consequence pair will be tried. It’s what we now recognize as a nested if-then-else. As a fallback, something that always evaluates to true is used. The quoted symbol t means true.

atom?

The atom? special form is used to detect whether something is a list or not. In LISP, all you have is either atoms, or lists containing atoms and/or other lists. For example, in order to detect whether an argument is a leaf of a tree, or a tree, the atom? form can be used.

(atom? 2)
=> true

(atom? '(1 2 3))
=> false

The question mark is a legal character in a symbol (even if it wasn’t in the original LISP). By convention, predicates (functions that return either true or false) are denoted with a trailing ?. In most other languages you would have to write some prefix, like isThis or hasThat.

eq?

Another useful predicate is eq?, which tells us whether two things are equal or not.

(eq? 2 5)
=> false

That was pretty obvious. Here is a cond with an eq? conditional:

(cond
	((eq? 2 5) 'match)
	('t 'not))
=> not

The number 2 is not equal to the number 5, so the second conditional expression, 't, is tried. It is of course true by definition, so 'not is evaluated. As we just learned, quoting means return the thing as it is, so the symbol not is returned from the cond.

cons

OK, we have quoting, conditionals, and a few useful predicates, but how do we construct things? The special form cons will prepend an atom to a list:

(cons 1 '(2 3 4))
=> (1 2 3 4)

The number 1 is placed at the beginning of the list (2 3 4), giving the list (1 2 3 4). Extremely useful, as we will see soon.

car

The car special form is the first of the two accessor forms. It’s very simple. It takes the first element of a list:

(car '(1 2 3))
=> 1

It also works if the first element is not an atom. Here we use a list that contains a list (1 2 3) and the number 4:

(car '((1 2 3) 4))
=> (1 2 3)

Note that this means we can extract sub-trees just as easily as leaves, from a list of leaves and sub-trees.

A better name would have been first. The name car actually stands for “Content of Address Register” and is an unfortunate implementation detail of the IBM machine used for the first implementation of LISP. You could say that the CAR is the first of a pair of registers, where the second part contains a link to the rest of the list.

cdr

The cdr special form simply takes the rest of a list, ie all but the first element:

(cdr '(1 2 3 4))
=> (2 3 4)

Granted, “Content of Decrement Register” is not a brilliant name. rest would have been better. But nevertheless, we now have a way of constructing and picking apart lists. If only we had some way of constructing functions…

lambda

The lambda special form is how anonymous functions are constructed. It takes a list of arguments and then zero or more lists that make up the function body. This function takes two arguments, x and y, and multiplies them:

(lambda (x y) (* x y))
=> #$eval577$fn__578

The function is never stored anywhere, so it just vanishes into the binary void. If we should get any use of it, we must place it as the first element in a list, and put the arguments after it:

((lambda (x y) (* x y)) 3 4)
=> 12

Now, if only we had some way of naming things. Then we could name the function and re-use it later…

label

The ninth and final special form of LISP is the label form. With that, we can assign a name to a form. This is useful for calling the form later in the program, but it’s also needed when writing a recursive function, since the function needs to declare that it calls itself somehow.

(label square (lambda (x y) (* x y)))
=> #square

(square 3 4)
=> 12

The SUBST function is a good example of why label is needed in a recursive function:

(label subst
  (lambda (x y z)
    (cond ((atom? z) (cond ((eq? z y) x) ((quote t) z)))
          ((quote t) (cons (subst x y (car z))
                           (subst x y (cdr z)))))))

This concludes the first part of the series. In the next part, we will dissect the SUBST function and test it on a little tree. We will also see how it could be re-implemented in a modern LISP such as Clojure.

This Post Has 7 Comments

  1. The comparison should be (eq? z y), but you have (eq? x y) instead.

    Looking forward to part 2.

  2. Thanks. I have updated the post.

    A funny thing is that in McCarthy’s paper, the second COND is missing a paren. I guess even he would have had use for a good editor at the time.

  3. Great article. Nice intro to Lisp and origins of Clojure. I’d glad we don’t have the commas everywhere today.

  4. Great intro to LISP, looking forward to part 2!

Leave a Reply

Close Menu