Scheme Pattern Matching

While writing an interpreter in Scheme that executes an Abstract Syntax Tree (AST), I sorely missed Haskell's pattern matching, which makes picking the code to run based on a node very simple.

Here follows the motivation for, explanation of and implementation of some simple Scheme code for basic pattern matching.

Source code with comments and examples (MPL-v2.0) or inline

I made a quick DuckDuckGo search for "scheme pattern matching" and mostly got results for a particular SICP exercise (that I cannot find anymore) and for a paper on pattern matching that used quite un-Scheme-like data constructors. (I somehow missed this lovely Racket page.)

Although decidedly Haskell-like, the pattern matching in the paper and (at a glance) in Racket's match – which uses data constructors – did not feel entirely like Scheme: I wished for a way of matching pure Scheme objects: lists, pairs and atoms. The code described and presented below is my effort at a basic implementation of that.

Although it is not quite as powerful as Haskell or Racket's pattern matching, nor perhaps is it perfect to use, it is simple and hopefully understandable to the average user. Having used it so far only for the toy interpreter that I mentioned above, I found it incredibly useful, and the only feature that I really miss is binding a destructured pattern. In Haskell, I can do this: func (Just x@[_]) = x to ensure that the list has exactly one element, but to actually bind the variable x to the list element itself and not to its single value.

(2022-05-26) The final part of that paragraph is no longer correct; I have added @-binding functionality to the pattern-matching code. The updated code and instructions are in the new post, while this post still has the old code and instructions (which apply to most of the functionality anyway).

A Motivating Example

The following example uses pattern matching on an AST node to print and return the variable name used in an expression, and prints an error message with some diagnostic information if it fails:

(define (pattern-match/motivating-example)
  (let ((e '(E (NAME (identifier . "x")))))
    (pattern-match
      e get
      (('E ('NAME ('identifier . id)))
       (printf "identifier: ~A~%" (get 'id))
       (get 'id))
      (x
       (printf "Unexpected expression: ~A~%" (get 'x))
       #f)))

Had the code used to match the nested AST node and to act on it been written "by hand", then it would either use some complex nesting that makes the code hard to read and a chore to write (example A), or it would use recursion and lose the context provided by a pattern match, thus also matching structures that are not necessarily intended (example B).

The main advantage of the pattern match is that it enables me as the developer to declare what shape I expect the data to have, and to decide upon which parts of it I wish to act (by binding them to names) in a minimal way. As the reader, I can clearly see what data the code receives as input and what it does with it just below. And finally as the maintainer, I get to easily extend the pattern match to new shapes of data without modifying the existing ones.

Example A: complex nesting; works for '(E (NAME (identifier . "x"))), but is hard to extend to '(E (lparen) (E (NAME (identifier . "x"))) (rparen)) for example:

(define (pattern-match/example-A e)
  (if (and (list? e)          (not (null? e))          (eqv? (car     e) 'E)
           (list? (cadr   e)) (not (null? (cadr   e))) (eqv? (caadr   e) 'NAME)
           (pair? (cadadr e)) (not (null? (cadadr e))) (eqv? (caadadr e) 'identifier))
    (begin
      (printf "identifier: ~A~%" (cdadadr e))
      (cdadadr e))
    (begin
      (printf "Unexpected expression: ~A~%" e)
      #f)))

Example B: missing context; matches '(E (NAME (identifier . "x"))), but also incorrectly matches '(NAME (E (identifier . "x"))):

(define (pattern-match/example-B e)
  (if (and (pair? e) (not (null? e)))
    (case (car e)
      ((E NAME)
       (pattern-match/example-B (cadr e)))
      ((identifier)
       (printf "identifier: ~A~%" (cdr e))
       (cdr e))
      (else
        (printf "Unexpected expression: ~A~%" e)
        #f))
    (begin
      (printf "Unexpected expression: ~A~%" e)
      #f)))

Usage

Only the pattern-match macro is intended to be used directly. The pattern matching logic itself however resides in the pattern-match/bind function, which takes in a single value and a pattern, and returns a (possibly empty) list of bindings if the match was successful, or #f if the value did not match the pattern.

The pattern of a match should represent the code as it would be displayed, with the exception of symbols:

if you intend to match a symbol literally, it should be quoted in the pattern;
if you intend to bind a symbol to a value, it should not be quoted in the pattern.

For example, this pattern matches the list that contains the symbols A and B: ('A 'B), while this pattern matches any list that contains any two values, and binds them to 'A and 'B: (A B).

The pattern-match macro looks a lot like a case expression: it takes a value to match, and then a list of patterns, each followed by code to run. Structurally, the main difference is that it actually takes two parameters before the list of expressions: the second is the name to give a getter function used for retrieving bindings.

The name of the getter function is used by the macro to scope a function of that name to the body of the successful pattern match. That function provides a way of accessing the bindings made within the pattern match; it takes a symbol, and returns the value associated with that (unquoted) symbol in the pattern. If the pattern were to contain ('identifier . x), and the getter function were called get, then the value of x (which could be any Scheme value) is retrieved with (get 'x) (note the quote). Nested pattern matching works, but the inner getter functions should not shadow the outer getter functions, as that will hide all outer bindings.

Implementation

To show how simple the pattern matching code is, the code in its entirety (including inline comments) is reproduced below. All code on this web page is made available under the Mozilla Public License Version 2.0.

;; Pattern match on a value and execute the expressions of the first match.
(define-syntax pattern-match
  (syntax-rules (else)
    ((pattern-match value getter-name
                    (pattern body ...)
                    ...
                    (else otherwise ...))
     (cond
       ((pattern-match/bind value (quote pattern))
        => (lambda (bindings)
             (let ((getter-name (lambda (id)
                                  (let ((found (assv id bindings)))
                                    (if found
                                        (cadr found)
                                        #f)))))
               body
               ...)))
       ...
       (else
         otherwise
         ...)))))

;; Create a list of bindings from a pattern and a value to match it to.
(define (pattern-match/bind value pattern)
  (cond
    ; If the pattern is a single-quoted symbol, it's a binding!
    ((and (symbol? pattern)
          (not (pattern-match/quoted-symbol? pattern)))
     (list (list pattern value)))
    ; If they're both empty, it's a match!
    ((and (null? value)
          (null? pattern))
     '())
    ; If one but not the other is empty, it's not a match... :(
    ((or (null? value)
         (null? pattern))
     #f)
    ; If they're both the same quoted symbols, it's a match!
    ; This is a special case, because symbols in the pattern must be
    ; double-quoted, otherwise they are identifiers (handled above).
    ((and (symbol? value)
          (pattern-match/quoted-symbol? pattern)
          (eqv? value (pattern-match/quoted-symbol->symbol pattern)))
     '())
    ; If they're both the same atoms, it's a match!
    ((and (atom? value)
          (atom? pattern)
          (equal? value pattern))
     '())
    ; If they're both pairs, then try recursively matching them...
    ((and (pair? value) (pair? pattern))
     (let ((lhs (pattern-match/bind (car value) (car pattern)))
           (rhs (pattern-match/bind (cdr value) (cdr pattern))))
       (pattern-match/unify lhs rhs)))
    ; We've run out of options; it's not a match... :(
    (else #f)))

;; Is the given value a quoted symbol?
(define (pattern-match/quoted-symbol? value)
  (and (list? value)
       (not (null? value))
       (not (null? (cdr value)))
       (null? (cddr value))
       (eqv? (car value) 'quote)
       (symbol? (cadr value))))

;; Extract the symbol from a quoted symbol.
(define pattern-match/quoted-symbol->symbol cadr)

;; Unify two branches of a pattern match bind.
(define (pattern-match/unify lhs rhs)
  (cond
    ; Recursive base case: if either branch failed, propagate the failure
    ((or (not lhs) (not rhs))
     #f)
    ; Recursive base case: if the right-hand-side bindings are empty, then
    ; return the left-hand-side bindings
    ((null? rhs)
     lhs)
    ; Recursive base case: if the left-hand-side bindings are empty, then
    ; return the right-hand-side bindings
    ((null? lhs)
     rhs)
    ; Recursive step: look for the first right-hand-side binding in the
    ; list of left-hand-side bindings. If it is present in the left-hand-side
    ; but the assignments do not match, this pattern match fails; otherwise
    ; recurse, holding on to the first right-hand-side binding if it does not
    ; appear on the left-hand-side, and continuing with the rest of the
    ; right-hand-side binding.
    (else
      (let ((found (assv (caar rhs) lhs)))
        (cond
          ((and found (equal? found (car rhs)))
           (pattern-match/unify lhs (cdr rhs)))
          (found
            #f)
          (else
            (cons (car rhs)
                  (pattern-match/unify lhs (cdr rhs)))))))))