Saturday, April 2, 2016

Unification: Things Pattern Matching Can't Do

This document is an answer to the question “Pattern Matching: Is There Anything it Can't Do?

Before we talk about the differences between pattern matching and unification, let’s talk about multiple assignment and destructuring assignment. Multiple assignment is a programming language feature that lets you assign multiple variables at once and assign different values to different variables. It might have a syntax like this:

x, y := 1, 2

There is a question about how to interpret the above statement. Should we view this as an assignment operator with two operands, one of which is the pair x,y and the other the pair 1,2, or should we view it as an operation with four independent operands: x, y, 1, and 2? If there are just two arguments, then you could replace either of them with a variable name and it would do the same thing:

t1 := x, y
t2 := 1, 2
t1 := t2

For most languages with multiple assignment, this does not do the same thing as the previous example (but with unification, it would).

There are languages such as Python and Go that choose an intermediate solution: the original expression has three operands: x, y, and the pair 1, 2. This one-sided solution is called destructuring assignment. Destructuring assignment is a one-sided form of multiple assignment where the phrase on the left is a syntactic structure that is part of the assignment operation and the expression on the right represents a value:

t1 := [1, 2]
[x, y] := t1  # destructuring assignment

The intermediate variable t1 is used to show that the expression on the right hand side represents a single value. This destructuring assignment still has three arguments, x, y, and t1. The clause [x, y] may look like a list constructor, but it is really part of the syntax of the assignment statement.

Pattern matching is a form of destructuring assignment that can be used in conditional contexts to affect flow of control. For example, in Scala and Rust there is a case statement that can be used to select one branch (and set of assignments) based on the pattern. Because pattern matching is an extension to destructuring assignment, I’ll continue to use the same symbol, “:=”. Here are some examples of pattern matching in a conditional context:

if [x, y] := [1, 2]
 then  # executes with x=1, y=2
 else  # does not execute
if [x, y] := [1, 2, 3]
 then  # does not execute because the match failed
 else  # executes, but x and y are unassigned

The clause on the left of the assignment operator is called the pattern. Just as for other destructuring assignments, the pattern is part of the syntax of the assignment and is not a data object on its own. On the right side is the match subject or just the subject.

Unification is both an equality predicate and a two-sided form of multiple assignment that matches a value to a value instead of a pattern to a value. It is a feature used in some logic programming languages such as Prolog and miniKanren to get the effect of logical equality. Let’s use the symbol “=” to represent unification:

[a, b] = [1, 2]  # succeeds and assigns a=1, b=2
[a, b] = [1, 2, 3]  # fails

This example makes unification look a lot like pattern matching, but there are important differences. What underlies the differences is this: unification is two-sided and works on logical variables, while pattern matching is one-sided and works on normal programming-language variables or normal single-assignment variables.

A logical variable is like a single-assignment variable but it can be used before it is assigned a value. In other words, it can be put in a structure, passed to a function, returned from a function, or have a predicate applied to it before ever being assigned. Pattern matching works on normal programming-language variables--mutable variables, or normal single-assignment variables.

For my examples, I’ll use the fictional programming language, Unobtainabol. Unobtainabol has all three kinds of variables, declared with the words “logical”, “mutable”, or, for a single-assignment variable, “constant”:

logical a, b
mutable x, y
constant i, j
[a, b] = [1, 2]  # unification assigns a=1, b=2
[x, y] := [11, 12]  # matching assigns x=11, y=12
[i, j] := [21, 22]  # matching assigns i=21, j=22

Unification is two-sided--it modifies variables on both sides:

logical a, b
mutable x, y
constant i, j
[a, 2] = [1, b]  # unification assigns a=1, b=2
[x, 12] := [11, y]  # error: accessing undefined variable y
[i, 22] := [21, j]  # error: accessing undefined constant j

In unification, both sides of the operation must be a term. A term in logic programming is not syntax but data, just like a list or a set. In fact, lists and sets are terms. A term is defined recursively as follows: a constant, a logical variable, or a structure where the parts are terms. Some examples:

[1, 2]
[1, [2, a]]

Once we have terms with logical variables, we need a word to describe data objects that do not contain logical variables: a constant term is a constant or a structure where the parts are constant terms. The right hand side of a pattern-matching statement must be a constant term.

Because unification has terms on both sides, there is the possibility of unifying two unassigned variables to each other:

logical a, b
[a, 1] = [b, 1]  # unifies a=b but a and b are unassigned
a = b  # unifies a=b but a and b are unassigned
b = 1  # assigns a=1, b=1

A common way of viewing this is that a=b creates a constraint on a and b. A constraint is a predicate that is asserted of logical variables--possibly before they are assigned--and prevents any assignment to those variables that violates the predicate. Prolog only has equality constraints, but other logic languages have other sorts of constraints, such as non-equality, less-than/greater-than, member of a set, etc.

Let’s borrow some terminology from variable scoping and say that a lexically restricted assignment is one in which the only variables that can be assigned are those that appear syntactically in the assignment statement. Normal assignment is lexically restricted--only the variable on the left of the assignment statement can change (a variable can be aliased, but you still have only one variable with multiple names).

Pattern matching is lexically-restricted, unification is not. Consider:

logical a, b, x
x = [a, b]
x = [1, 2]  # succeeds and sets a=1, b=2, x=[1, 2]

The second unification binds the variables a and b which do not appear in the statement.

Another consequence of logical variables is that unification is non-sequential--bindings can happen in any order and the outcome is the same. For example given:

logical a, b, c
a = b
b = c
c = 1

You can do the three unifications in any order and get the same final result: all three variables equal to 1. Pattern matching, like assignment, is sequential:

mutable x, y
x := 1
y := x

If you reverse the order of the pattern-matching statements, the first one will cause an error because x will be undefined at the point of use.

A related feature is that unification is idempotent. The same unification can be done any number of times and it doesn’t change the ultimate outcome, either the success/failure outcome or the binding of variables on success:

logical a, b
a = b
a = b
a = 1
b = 1
b = a

This is not true for pattern matching with mutable variables:

mutable x, y
x := 1
x := x+1

The end result of this is that x=2, but if you repeat the second pattern-matching statement then the end result would be x=3.

Pattern matching with single-assignment variables is not idempotent either with the usual implementation:

constant i
i := 1  # succeeds
i := 1  # error -- i is already assigned

However, there is a variation where patterns are allowed to have constants in the term and it acts like a restriction:

constant i, j
[1, i, j, 4] := [1, 2, 3, 4]  # succeeds, sets i=2, j=3
[4, i, j, 1] := [1, 2, 3, 4]  # fails -- constants do not match up

With this feature, patterns on single-assignment variables are idempotent:

constant i
i := 1  # succeeds
i := 1  # succeeds by matching constant to literal

But in general, pattern matching is not idempotent.

We typically think of pattern matching as a feature that assigns variables, but only if the match subject has a certain structure. In other words, we generally think that every value that matches a certain pattern must be structurally equivalent, but this is not true. In Python, for example the following both succeed even though the values on the right are not structurally equivalent (the code is actual Python, which means that “=” represents destructuring assignment):

a, b = (1, 2)
a, b = [1, 2]

This is a way in which pattern matching is more expressive than unification: patterns can have special syntax that let a single pattern match a broader variety of structures. For example, you could define a syntax x<=expr that means, “if there is no member to match against x, then succeed anyway and assign the default value expr to x.” For example:

mutable x, y
[x, y<=10] := [1, 2]  # succeeds, sets x=1, y=2
[x, y<=10] := [1]  # succeeds, sets x=1, y=10

So, with all of that, here is a summary of the differences:

pattern matching
Matches a syntactic pattern to a data object.
Matches a data object to a data object.
mutable variables or single-assignment variables
logical variables
one-sided: Only assigns to variables on the left.
two-sided: Assigns in both directions.
No constraints are created.
Can create equality constraints.
lexically restricted: Only assigns to variables that occur syntactically in the statement.
lexically unrestricted: Assigns to variables that do not occur syntactically in the statement.
sequential--Assignments must occur in the presented order.
non-sequential--Assignments can be done in any order.
non-idempotent--Assignments cannot generally be repeated.
idempotent--Assignments can be repeated without changing the effect.
A single pattern can match different terms with different structures.
A term can only match another term with the same structure.

Sunday, February 21, 2016

Pattern Matching: Is There Anything it Can't Do?

I recently read Joe Duffy’s excellent account of the error-handling mechanism in Midori. It can be viewed either as an exception-handling mechanism or as an error-return mechanism using pattern matching. This is cool because it means that exception raising is the fourth function-calling abstraction that can be eliminated in favor of pattern matching, so long as we have expressive-enough pattern matching. I’m going to take this opportunity to describe the four eliminations.

A pattern is an expression representing a value with variable elements. Some of the variable elements are named, representing variables. Pattern matching is the process of comparing a pattern against a value (called the subject) to try to find an assignment of values to the variable elements of the pattern which would make the resulting value equal to the subject. If such an assignment is found, then the match is said to succeed, and any variables in the pattern are left equal to the value they were assigned. In the kinds of patterns we are considering, there is never more than one way in which a pattern may match a subject.

Multiple Return Values as Pattern Matching

Common Lisp has a concept called multiple values. A multiple-values “thing” is not a first-class value: one cannot insert one into a list, for example, nor is there any distinction between a single value and a multiple-values thing that has one value. This feature is really intended just to allow a function to return multiple values at once. The values are collected at the return site of the function:
(defun sum-and-product (x y)
 ; returns two values, x+y and x*y
 (values (+ x y) (* x y))
and burst into individual values at the call site
(multiple-value-setq (sum, prod) (sum-and-product 2 3))
 ; here, sum=5 and prod=6
This is a useful concept because it is more efficient than creating a list of values and then taking the list apart. An implementation that returns values in registers would only have to set the first return register to 5 and the second to 6 in the above example, then those registers can be used at the call site without further effort.

Many languages have generalized this concept with tuples and pattern matching. For example in Python, a tuple is an immutable sequence of values created by a list of comma-separated expressions:
2, 3   # A tuple of two integers
An assignment with a tuple of variables on the left and a tuple on the right, will assign the corresponding values:
x, y = 2, 3  # Assigns x=2 and y=3
Naturally, a function that returns a tuple would work the same on the right hand side:
def SumAndProd(a, b): return a+b, a*b
x, y = SumAndProd(2, 3)  # Assigns x=5 and y=6
A Python tuple is different from a Common Lisp multiple value because a tuple is a data object on its own. In the above example,
result = SumAndProd(2, 3)
would assign to x a tuple of (5, 6), whereas the corresponding Common Lisp expression
(setq result (sum-and-product 2 3))
would only set result to the first element of the multiple result, 5.

Python has to construct the tuple at the return site and then access the elements of the tuple at the call site to assign them to x and y. Common Lisp can can put the values directly in registers at the return site and use them directly at the call site.

The reason Common Lisp can do this is because we can have a calling convention that deals directly with multiple values. If we have, say, 4 registers dedicated to return values, the callee can return up to four values in registers. Any expression that is not expecting multiple results will only look at the first register and only use the first of the multiple results, which is what the semantics says it should do. Everything works out (you also need to return an indication of how many results were returned--I assume there is a dedicated register for this in implementations that return values in registers).

In a statically-typed language that lets us declare that a function returns a tuple of a certain length, we can define a calling convention such that a function that returns a tuple of four elements or less returns the elements in the four result registers without constructing a tuple. The call site then has to know about this calling convention and handle it appropriately. In particular, if the caller actually wants a tuple rather than the individual values, then the caller must construct the tuple from the values in the registers.

Argument Assignment as Pattern Matching

I believe I’ve seen this in another language, but I don’t recall where, so I’ll talk about this as a feature of Unobtainabol. In Unobtainabol, a tuple is a sequence of values. For example:
(‘a’, ‘b’, ‘c’, ‘d’)
represents a tuple of four elements numbered 1 to 4. The parenthesis are not required except for disambiguation.

A pattern is a tuple expression with possibly-typed variables in place of values. A pattern can be matched against a subject using the assignment statement:
a, b = 1, 2  # assigns a=1, b=2
a, b, c = 1, 2  # fails and raises an error
int a, float b = 1, 2.2  # assigns a=1, b=2.2
int a, int b = 1, 2.2  # fails and raises an error
It should be no surprise now to say that an Unobtainabol function call takes a single argument, which is a tuple, For example
f(‘a’, ‘b’, ‘c’, ‘d’)
The formal parameter list is a pattern which matches the argument tuple and assigns to the formal parameters. For example, the above call, with the function
void f(a, b, string c, string d)
would produce the formal parameter assignments: a=’a’, b=’b’, c=’c’, d=‘d’.

With suitable extensions to tuples and matching, we can handle default values, keyword arguments, variable argument lists, and the various parameter passing methods. The result is a parameter-passing mechanism that is identical to pattern-matching assignment in the language, so there is only one set of rules for the programmer to learn in order to use both features.

Dispatch as Pattern Matching

For our purposes, a function signature is the list of argument types and possibly other information about the arguments. Many languages allow polymorphic functions: multiple functions with the same name but differing signatures:
int f(int i);
int f(int i, int j);
int f(float x);
The selection of which of a set of polymorphic functions to invoke is called function dispatch.

Dispatching is a fairly complex matching problem. Frequently, more than one function signature matches the actual parameters, so there must be rules about how to choose the best match. For example, in the presence of automatic casting, the call f(0) would match both the first and third signatures above because while 0 is an int, it can be cast to a float.

A pattern-matching case is a case expression or statement that uses pattern matching rather than mere equality to resolve the condition. For example:
match x {
case (int a, int b): ...
case (int a, float b): ...
This will select the branch that best matches x and will execute the associated clause with the indicated bindings.

Some functional languages allow functions to be declared as a set of clauses like this:
factorial(0) = 1
factorial(n) |n > 0| = n * factorial(n-1)
The expression inside |...| is called a guard; it is a way to extend patterns with general conditions. The pattern only matches when the guard is true.

The function above would be equivalent to one declared with a pattern-matching case:
factorial(n) = switch n {
 case 0: 1
 case i |i > 0|: factorial(i-1)
All of this suggests the following: a collection of polymorphic functions, all with the same name can be viewed as a single function where the body is a pattern-matching case over the formal parameter lists of the individual signatures. For example, given the functions above
int f(int i) {body1}
int f(int i, int j) {body2}
int f(float x) {body3}
These would be equivalent to the single function:
int f(Args) {
 match Args {
   case (int i): {body1}
   case (int i, int j): {body2}
   case (float x): {body3}
A call to f(actual-parameters) gets resolved by setting Args=actual-parameters and then executing the switch statement. This unifies the concepts of dynamic dispatch and pattern matching. Static dispatch is still an open issue.

In Unobtainabol we take this all the way: you don’t actually have to give a formal parameters list; you can just use Args in the body of the function. Args[1] is the first actual parameter, Args[2] is the second, and so on. For extra convenients, we define $1 as an abbreviation for Args[1], $2 as an abbreviation for Args[2], and so on. This makes for a nice syntax for small closures.

Exception Handling as Pattern Matching

A continuation is a control point in the program. Technically, a continuation encodes both an address in the code and an environment (a stack frame), but when the continuation is just used as a function return and the function uses the normal stack discipline, the continuation can be encoded as just an address in the code because normal stack handling will take care of the environment. Because of this, I will sometimes conflate the notion of a continuation and a code address.

A basic function, when it has done its work just returns to the caller at the point after the call. In pseudo-assembly language it looks like this:
 push arg 1
 push arg 2
 push labelA
 goto function
First we push the arguments and the continuation/return label on the stack, then we jump to the function. When the function is done, it will pull the return label off the stack and jump to it. The stack may be altered either before the return or after, depending on the calling convention. It’s better to pass arguments in registers than on the stack, but you usually need a stack anyway so I’ll ignore registers.

In some cases, you need two continuations. For example, suppose f is a generator function and the language has a looping feature that uses a generator function like this:
for x in f(y):
 do something with x
go on with program
The code for this is pretty complicated and it doesn’t just use normal stack discipline, so I’ll spare you the details, but suffice it to say that we call f with two continuations: one for “do something with x” and one for “go on with the program”. The function will branch to the first one for each value it generates, and then branch to the second one when there are no more values.

Another use for multiple continuations is for functions that throw exceptions. One continuation would be for normal value return and another continuation for a thrown exception. Or there can be one continuation for each type of exception thrown by the function. Let’s assume a continuation for each type of exception, so
try {x = f();}
 catch Exception1 e: {handler1}
 catch Exception2 e: {handler2}
go on with program
represents a call to function f with three continuations: the first one jumps to “go on with program”. The others jump to handler1 or handler2.

Passing an arbitrarily long list of continuations could get expensive. Actually, just passing two continuations may be measurably more expensive than one, so instead of passing all of the continuations, we can pass in just one and have the others findable from the one. For the above code it would look like this:
 push return address: normal_label
 call f
 code for handler1
 code for handler2
 code for go on with program
So a normal exit just jumps to ret, the return point. A throw to handle1 jumps to the address found at ret[-8] where 8 is the word size. A throw to handle2 jumps to the address found at ret[-16].

In the normal execution case, there is practically no overhead for the existence of additional continuations. In the case of an exception, there is an extra indirection. While this technique is probably optimal for the normal case, it does have the disadvantage that it can’t pop off multiple frames at once--it has to branch to and clean up each individual stack frame from the point of the throw to the point of the exception. But I’m pretty sure most exception-handling protocols do that anyway.

Rust supports a discriminated union data type that it calls an enum. Rather than Rust syntax, which is a bit obscure, I’ll use C-like syntax and call the discriminated union a dunion instead of an enum:
dunion Foo {int Bar; float Baz;};
The typecase in Rust is a special case of their pattern-matching statement:
match foo {
 case Foo::Bar(i): code to execute when foo is an int,
 case Foo::Baz(f): code to execute when foo is a float
The name in parenthesis is assigned the unboxed value. For example if foo=5, then we will jump to the first branch and execute the body with i set to 5.

Rust error handling makes use of a discriminated union that returns either the result of the function or an error of some sort. For example, if if the normal return value is an integer, but it can throw a string exception, we might use this:
dunion Result {
   int Value,
   string Err,
The function is then:
Result f(...) {...}
and you call it in a match statement
match f(...) {
 Result::Value i: code to execute on success
 Result::Err s: code to execute on error
There is a similarity between this and the multiple-continuation example.

So why wasn’t this obvious? The reason is that with exceptions, you can have a single handler for multiple function calls:
try {
 i = f1(x);
 j = f2(y);
 k = f3(z);
catch (Exception e) {...}
All of the functions could be called with the same handler. This isn’t usually good style--it’s better to keep your exception handling with the function that can throw the exception--but it’s occasionally useful and most exception mechanisms allow something like this.

Unfortunately this doesn’t map into the discriminated-union paradigm. The innovation of Midori is to break up the monolithic exception handler into two parts: a function-return mechanism that is equivalent to returning a discriminated union, and a strictly local control structure similar to a switch statement. With Midori-style exception handling we would write the above something like this:
try {
 i = try f1(x);
 j = f2(y);
 k = try f3(z);
catch (Exception e) {handle exception}
This tells us that f1 and f3 can throw exceptions. The trys that come right before the functions will pass on an exception if the function raises one. try f1(...) is an abbreviation for code something like this:
match f1(...) {
 Result::Value(temp_i): i = temp_i; break;
 Result::Err(e): {goto handle exception}
The try with the brace after it acts similar to a switch statement where the catches replace cases.

With this change, we can replace exceptions with discriminated unions and still optimize as if it were multiple continuations. Of course you probably need for the optimizer to distinguish between discriminated unions that represent exceptions and those that don’t because the optimizations rely on the fact that one branch is a lot more likely than the others.

Still, with this change, Unobtainabol can have all the features and optimizations of a language with a complex function model, but only support non-polymorphic, single-argument functions that return a single value to a single continuation.