Friday, September 14, 2018

the probability of the unknown

John. C. Wright (incidentally one of the finest science fiction and fantasy writers of our time) discusses a problem called the Carter Catastrophe. Mr. Wright argues that the problem statement is nonsense. I argue in opposition that the reasoning of the problem is correct as far as it goes, but that it offers almost no information. Mr. Wright comes to his conclusion from the position that probability is about repeatable events, and if he were right about that, his conclusion would be correct, but although he has a lot of brilliant scientists and mathematicians who agree with him, I do not.

I am one of a minority who believe that probability is, in fact, a form of logic. Traditional logic is based on two possibilities called truth values: True and False. Although this is by far the most common kind of logic, there are others. Some logics use three truth values: True, False, and Unknown. Others use four: True, False, Unknown, and Contradictory. Probability is just another form of logic that has a range of values from True (probability 1) to False (probability 0) and all shades of unknown in between.

Mr. Wright actually wrote a great science fiction book involving a non-two-valued logic. The book is The Null-A Continuum, and "Null-A" stands for non-Aristotlean logic. Aristotle's logic was the usual True/False variety, and the name was no doubt intended for similarity to non-Euclidean geometry.

So, given the view of probability as a form of logic, what is the probability of a statement x that you have absolutely no information about? Clearly you have no reason to prefer one value over another, so the probability is 1/2, which means completely unknown. Now let me give you a little more information about x:
x is the statement: when I throw this 6-sided die, I will roll a 1
Now you have the additional information that there are six mutually exclusive possibilities. Which one should you favor? Well, you have no  reason to prefer one possibility over the other, so the probability is 1/6. If you are a frequentist like Mr. Wright, you can't actually give a number until someone tells you that the die is a fair one, but as a matter of logic, if all you know is that one of six possibilities is true, and you have no other information at all, then the probability of any of the possibilities is 1/6.

Notice, however, that once we have the additional information, we changed the probability from 1/2 to 1/6. The 1/2 is no longer relevant because we have more information (this is one of the ways that probability is less nice than traditional logic; traditional logic is monotonic in the sense that adding more information cannot change your answer).

With this background, let's get back to the Carter Catastrophe, which makes sense if logic is probability. Suppose you have no more information than this:
Event e has not happened in the last hour but will happen eventually.
Let's say that the time from one hour ago to the next time e happens is t. We have no information at all about where now sits in the time range from 0 to t, so any time in that range is equally likely. This means that, for example, the chance that were are in the top half (or the bottom half) of the range is 1/2. In fact, the chance that we are in any 1/n of the range is 1/n.

This is the sort of reasoning that the Carter Catastrophe is based on, and it is logically sound, but the issue is that it is logically sound given that we have almost no information. Any other probability based on any more information at all will be a better estimate. You can't argue that something is likely based on almost no information when there is actual information to look at.

Frequentists like Mr. Wright object to assigning probabilities like this for non-repeatable events, but there is a semi-frequentist interpretation: if someone were to pick a bunch of random problems of this low-information type and offer you bets based on the problem, then reasoning with logical probability is the best strategy you can use to maximize your wins (or minimize your loses), assuming the questions were not deliberately chosen to defeat probability logic.

Sunday, March 4, 2018

Schema Evolution

A schema is a collection of data definitions. There are three different places where schemas are used: to define the data structures within a single program, to define the data structures within a database, and to define the data structures that are communicated between different programs, typically over a network.

In the two external cases: databases and communications, there is an interesting problem associated with schemas: how do you change them? Software is evolving all the time and the schemas have to evolve with it. In the case of databases this creates a problem when a new version of software comes out with a new schema. The existing data is in the old schema but the software wants to use the new schema, what do you do? In the case of communications, there is a similar problem. It often isn't practical to change all software at once. A single entity may not even own all of the software. When two different pieces of software are using two different versions of the schema, how do they communicate?

I'll be focusing on Protocol Buffers which is oriented towards communication and Avro which is oriented towards databases, although either can be used for both purposes. If you don't know what these systems are, I suggest you read this excellent but somewhat dated article which discusses how they each treat schema evolution. What I want to discuss here is not how actual systems work, but how an ideal system would work. Since Unobtainabol is the perfect programming language, it has to have perfect schema evolution.

Typically, a schema is a collection of record definitions where a records is a collection of named fields and each field can have a set of properties. For example here is one record definition:
record Person {
  name: string title_case default 'Anonymous',
  phone_type: string,
  sex allowed: {'M', 'F'},
  children: int optional,
Each fields is defined with a name followed by a (possibly empty) list of properties, followed by a comma. The properties illustrated above include type, default value, input conversions (name is converted so that each word in the name is capitalized), a list of allowed values, and an indication that a field is optional.

We can consider the following four categories of changes:add a field, remove a field, change the properties of a field, or change the name of a field. In the following V1 is a program that uses the record definition above, and V2 is a program that uses a modified definition. We'll discuss what happens when V2 sends messages to V1 and receives messages from V1. The case where V1 or V2 is a database is more complicated so we will defer it.

Deleting a Field

When V2 receives a field from V1, it receives a new field that it does not know what to do with. With Protocol Buffers, it will just ignore the unknown field. The problem with that solution is that getting a field that you don't recognize may indicate other problems, and the existence of those problems is hidden from you. It would be better to get some sort of warning when this happens. The simplest way to handle this is for V2 to know that a field has been deleted, and we can encode this in the record definition by adding a version tag like this:
record Person v2 {
  name: string title_case default 'Anonymous',
  phone_type: string,
  sex: allowed {'M', 'F'},
  children: int optional deleted v2,
The changes are highlighted in bold. This record definition should be read as saying that it is version two of the record, and that this version differs from the previous version by deletion of the children field. When V2 is compiled, it not only knows the current version of the record, it also knows that the previous version had another field that is no longer present. When it then sees this field, it can safely ignore it.

The situation when V2 sends a record to V1 is more complicated. When V2 sends a record, it does not send the children field because it does not have a value for the field. Since V1 thinks the field is optional, this situation might have a trivial solution if we represent records in such a way that it is impossible to tell the difference between a missing optional field and a field that simply does not exist. This is essentially how Protocol Buffers works.

However, this solution only works for fields that were optional before they were deleted. What if we deleted the phone_type field? Within Google, they strongly recommend making all fields optional for just this reason. In fact, in proto3, all fields are optional--there are no required fields. Google's attitude on this was formed from a single incident where deleting a non-optional field led to significant downtime for the production search engine. Careers were in jeopardy, Panic ensued. Since that frightening experience, Google has become so committed to this model that proto3 doesn't even provide a way to know when a value was missing; it just silently adds a zero or an empty string. I hardly think it's worth pointing out the disasters that can occur from using bad data like this. Even worse, there is nothing to indicate where the problem is. The results of using bad data can percolate up the system for a long ways before causing a problem that seems completely unrelated to the message.

The ideal schema evolution does not silently insert bad data into your programs. Suppose we do delete phone_type and suppose V1 is the component responsible for sending SMS messages to the users, and that it needs the phone type to know if the phone can receive SMS. Google's strategy would turn phone_type into an empty string, which could cause V1 to crash. A better response is a reply message to V2, warning it that there is a problem in the communication. Maybe it's actually a bug in V2 that needs to be fixed.

To really solve the problem, we need to be able to supply a default value for required fields that have been deleted. There are two cases, either the field already had a default value, in which case it is very unlikely that there can be a problem using it, or you have to add a default value. Adding a default value at least requires the programmer to think about how older programs are using the field.

I'll also discuss three different places to generate the default value:

  1. in V2 which was the last component compiled, so it can know about all previous schemas.
  2. in the protocol itself, relying on a registry of schemas.
  3. in V1, from information provided by V2 or by a registry.

Before we discuss these options, let's go over what sorts of defaults we need to support. Sometimes you can just pick a constant for a default. Suppose we are removing phone_type because we are dropping the SMS service from our application. Then we want to just always say that a phone is a land line so that V1 does not send any messages:
record Person v2 {
  name: string title_case default 'Anonymous',
  phone_type: string default 'landline' deleted v2,
  sex: allowed {'M', 'F'},
  children: int optional,
In this example, I assume that adding a default is always allowed, so you can add a default in the same change where you delete the field.

But what if there is no constant value that  makes a proper default? What if we are removing the phone type because we now have a way to figure out whether or not we can send an SMS based on the phone number? We need an expression to calculate the default, and the expression has to reference another field of the record:
record Person v2 {
name: string title_case default 'Anonymous',
phone_type: string
  default {get_phone_type(phone)} deleted v2,
sex: allowed {'M', 'F'},
children: int optional,
V1 may not have the function get_phone_type. This and similar issues make it impractical to try to get V1 to calculate its own default values. That leaves us with possibilities 1 and 2: either there is a schema registry and the protocol itself supplies the default value, or V2 supplies the default value.

The problem with a schema registry is that it is a very heavy-weight solution. Network services are hard to set up and maintain--always much harder than the documentation seems to suggest. Also, they are another point of failure in your system. There are applications for a schema registry, but it should be possible to solve schema evolution without one.

The only choice left is to make V2 responsible for everything. V2 has to have the code necessary to send and receive messages for every past version of the schema that might still be in operation. When V2 and V1 set up communication, they have to exchange their schema versions. The one with the older version then relies on the one with the newer version to do all necessary translation (except for possible optimizations that might push work to V1 to reduce message size).

to be continued

Tuesday, January 9, 2018

Why I'm suing Google

I'm "the other plaintiff" on the famous Damore lawsuit against Google. Since there is some curiosity about the case, I thought I'd write this note to give people a little information about me and about my reasons for going forward with the lawsuit.
First, I don't hate Google, and I certainly don't hate the people who work there. The large bulk of Googlers are wonderful people — smart, kind, and wanting to do the right thing. I wouldn't want this suit to give people a bad opinion of Googlers, but, honestly, they brought this on themselves for tolerating the hatred, racism and misandry of a small but vocal and organized subgroup who want to use Google as a vehicle of social change rather than as a vehicle of delivering excellent service and products to their customers.
I was disappointed at the lack of support I received from the good people at Google when I came under attack. In fact, I was attacked largely for coming to the defense of others. I hoped my example would set a precedent, and others would join me in standing up for kindness, tolerance, and just getting along despite differences, but no. Although I am disappointed, I do understand why people stood silent while their colleague was treated unfairly. Everyone is afraid of the hate group that dominates the discussion at Google. No one wants to jeopardize their job or career prospects to defend someone they don't know.
I understand, and that is why we need lawsuits like this — to make the consequences of going along with the illegal behavior of the hate groups worse than the consequences of challenging them. Courage from better incentives. I believe a lot of Googlers and others in Silicon Valley will secretly welcome this lawsuit as a vehicle for giving them the support they need to speak out when they see something wrong. At least, that is my hope.

Saturday, April 2, 2016

Unification: Things Pattern Matching Can't Do

This document is an answer to the question “Pattern Matching: Is There Anything it Can't Do?

Before we talk about the differences between pattern matching and unification, let’s talk about multiple assignment and destructuring assignment. Multiple assignment is a programming language feature that lets you assign multiple variables at once and assign different values to different variables. It might have a syntax like this:

x, y := 1, 2

There is a question about how to interpret the above statement. Should we view this as an assignment operator with two operands, one of which is the pair x,y and the other the pair 1,2, or should we view it as an operation with four independent operands: x, y, 1, and 2? If there are just two arguments, then you could replace either of them with a variable name and it would do the same thing:

t1 := x, y
t2 := 1, 2
t1 := t2

For most languages with multiple assignment, this does not do the same thing as the previous example (but with unification, it would).

There are languages such as Python and Go that choose an intermediate solution: the original expression has three operands: x, y, and the pair 1, 2. This one-sided solution is called destructuring assignment. Destructuring assignment is a one-sided form of multiple assignment where the phrase on the left is a syntactic structure that is part of the assignment operation and the expression on the right represents a value:

t1 := [1, 2]
[x, y] := t1  # destructuring assignment

The intermediate variable t1 is used to show that the expression on the right hand side represents a single value. This destructuring assignment still has three arguments, x, y, and t1. The clause [x, y] may look like a list constructor, but it is really part of the syntax of the assignment statement.

Pattern matching is a form of destructuring assignment that can be used in conditional contexts to affect flow of control. For example, in Scala and Rust there is a case statement that can be used to select one branch (and set of assignments) based on the pattern. Because pattern matching is an extension to destructuring assignment, I’ll continue to use the same symbol, “:=”. Here are some examples of pattern matching in a conditional context:

if [x, y] := [1, 2]
 then  # executes with x=1, y=2
 else  # does not execute
if [x, y] := [1, 2, 3]
 then  # does not execute because the match failed
 else  # executes, but x and y are unassigned

The clause on the left of the assignment operator is called the pattern. Just as for other destructuring assignments, the pattern is part of the syntax of the assignment and is not a data object on its own. On the right side is the match subject or just the subject.

Unification is both an equality predicate and a two-sided form of multiple assignment that matches a value to a value instead of a pattern to a value. It is a feature used in some logic programming languages such as Prolog and miniKanren to get the effect of logical equality. Let’s use the symbol “=” to represent unification:

[a, b] = [1, 2]  # succeeds and assigns a=1, b=2
[a, b] = [1, 2, 3]  # fails

This example makes unification look a lot like pattern matching, but there are important differences. What underlies the differences is this: unification is two-sided and works on logical variables, while pattern matching is one-sided and works on normal programming-language variables or normal single-assignment variables.

A logical variable is like a single-assignment variable but it can be used before it is assigned a value. In other words, it can be put in a structure, passed to a function, returned from a function, or have a predicate applied to it before ever being assigned. Pattern matching works on normal programming-language variables--mutable variables, or normal single-assignment variables.

For my examples, I’ll use the fictional programming language, Unobtainabol. Unobtainabol has all three kinds of variables, declared with the words “logical”, “mutable”, or, for a single-assignment variable, “constant”:

logical a, b
mutable x, y
constant i, j
[a, b] = [1, 2]  # unification assigns a=1, b=2
[x, y] := [11, 12]  # matching assigns x=11, y=12
[i, j] := [21, 22]  # matching assigns i=21, j=22

Unification is two-sided--it modifies variables on both sides:

logical a, b
mutable x, y
constant i, j
[a, 2] = [1, b]  # unification assigns a=1, b=2
[x, 12] := [11, y]  # error: accessing undefined variable y
[i, 22] := [21, j]  # error: accessing undefined constant j

In unification, both sides of the operation must be a term. A term in logic programming is not syntax but data, just like a list or a set. In fact, lists and sets are terms. A term is defined recursively as follows: a constant, a logical variable, or a structure where the parts are terms. Some examples:

[1, 2]
[1, [2, a]]

Once we have terms with logical variables, we need a word to describe data objects that do not contain logical variables: a constant term is a constant or a structure where the parts are constant terms. The right hand side of a pattern-matching statement must be a constant term.

Because unification has terms on both sides, there is the possibility of unifying two unassigned variables to each other:

logical a, b
[a, 1] = [b, 1]  # unifies a=b but a and b are unassigned
a = b  # unifies a=b but a and b are unassigned
b = 1  # assigns a=1, b=1

A common way of viewing this is that a=b creates a constraint on a and b. A constraint is a predicate that is asserted of logical variables--possibly before they are assigned--and prevents any assignment to those variables that violates the predicate. Prolog only has equality constraints, but other logic languages have other sorts of constraints, such as non-equality, less-than/greater-than, member of a set, etc.

Let’s borrow some terminology from variable scoping and say that a lexically restricted assignment is one in which the only variables that can be assigned are those that appear syntactically in the assignment statement. Normal assignment is lexically restricted--only the variable on the left of the assignment statement can change (a variable can be aliased, but you still have only one variable with multiple names).

Pattern matching is lexically-restricted, unification is not. Consider:

logical a, b, x
x = [a, b]
x = [1, 2]  # succeeds and sets a=1, b=2, x=[1, 2]

The second unification binds the variables a and b which do not appear in the statement.

Another consequence of logical variables is that unification is non-sequential--bindings can happen in any order and the outcome is the same. For example given:

logical a, b, c
a = b
b = c
c = 1

You can do the three unifications in any order and get the same final result: all three variables equal to 1. Pattern matching, like assignment, is sequential:

mutable x, y
x := 1
y := x

If you reverse the order of the pattern-matching statements, the first one will cause an error because x will be undefined at the point of use.

A related feature is that unification is idempotent. The same unification can be done any number of times and it doesn’t change the ultimate outcome, either the success/failure outcome or the binding of variables on success:

logical a, b
a = b
a = b
a = 1
b = 1
b = a

This is not true for pattern matching with mutable variables:

mutable x, y
x := 1
x := x+1

The end result of this is that x=2, but if you repeat the second pattern-matching statement then the end result would be x=3.

Pattern matching with single-assignment variables is not idempotent either with the usual implementation:

constant i
i := 1  # succeeds
i := 1  # error -- i is already assigned

However, there is a variation where patterns are allowed to have constants in the term and it acts like a restriction:

constant i, j
[1, i, j, 4] := [1, 2, 3, 4]  # succeeds, sets i=2, j=3
[4, i, j, 1] := [1, 2, 3, 4]  # fails -- constants do not match up

With this feature, patterns on single-assignment variables are idempotent:

constant i
i := 1  # succeeds
i := 1  # succeeds by matching constant to literal

But in general, pattern matching is not idempotent.

We typically think of pattern matching as a feature that assigns variables, but only if the match subject has a certain structure. In other words, we generally think that every value that matches a certain pattern must be structurally equivalent, but this is not true. In Python, for example the following both succeed even though the values on the right are not structurally equivalent (the code is actual Python, which means that “=” represents destructuring assignment):

a, b = (1, 2)
a, b = [1, 2]

This is a way in which pattern matching is more expressive than unification: patterns can have special syntax that let a single pattern match a broader variety of structures. For example, you could define a syntax x<=expr that means, “if there is no member to match against x, then succeed anyway and assign the default value expr to x.” For example:

mutable x, y
[x, y<=10] := [1, 2]  # succeeds, sets x=1, y=2
[x, y<=10] := [1]  # succeeds, sets x=1, y=10

So, with all of that, here is a summary of the differences:

pattern matching
Matches a syntactic pattern to a data object.
Matches a data object to a data object.
mutable variables or single-assignment variables
logical variables
one-sided: Only assigns to variables on the left.
two-sided: Assigns in both directions.
No constraints are created.
Can create equality constraints.
lexically restricted: Only assigns to variables that occur syntactically in the statement.
lexically unrestricted: Assigns to variables that do not occur syntactically in the statement.
sequential--Assignments must occur in the presented order.
non-sequential--Assignments can be done in any order.
non-idempotent--Assignments cannot generally be repeated.
idempotent--Assignments can be repeated without changing the effect.
A single pattern can match different terms with different structures.
A term can only match another term with the same structure.

Sunday, February 21, 2016

Pattern Matching: Is There Anything it Can't Do?

I recently read Joe Duffy’s excellent account of the error-handling mechanism in Midori. It can be viewed either as an exception-handling mechanism or as an error-return mechanism using pattern matching. This is cool because it means that exception raising is the fourth function-calling abstraction that can be eliminated in favor of pattern matching, so long as we have expressive-enough pattern matching. I’m going to take this opportunity to describe the four eliminations.

A pattern is an expression representing a value with variable elements. Some of the variable elements are named, representing variables. Pattern matching is the process of comparing a pattern against a value (called the subject) to try to find an assignment of values to the variable elements of the pattern which would make the resulting value equal to the subject. If such an assignment is found, then the match is said to succeed, and any variables in the pattern are left equal to the value they were assigned. In the kinds of patterns we are considering, there is never more than one way in which a pattern may match a subject.

Multiple Return Values as Pattern Matching

Common Lisp has a concept called multiple values. A multiple-values “thing” is not a first-class value: one cannot insert one into a list, for example, nor is there any distinction between a single value and a multiple-values thing that has one value. This feature is really intended just to allow a function to return multiple values at once. The values are collected at the return site of the function:
(defun sum-and-product (x y)
 ; returns two values, x+y and x*y
 (values (+ x y) (* x y))
and burst into individual values at the call site
(multiple-value-setq (sum, prod) (sum-and-product 2 3))
 ; here, sum=5 and prod=6
This is a useful concept because it is more efficient than creating a list of values and then taking the list apart. An implementation that returns values in registers would only have to set the first return register to 5 and the second to 6 in the above example, then those registers can be used at the call site without further effort.

Many languages have generalized this concept with tuples and pattern matching. For example in Python, a tuple is an immutable sequence of values created by a list of comma-separated expressions:
2, 3   # A tuple of two integers
An assignment with a tuple of variables on the left and a tuple on the right, will assign the corresponding values:
x, y = 2, 3  # Assigns x=2 and y=3
Naturally, a function that returns a tuple would work the same on the right hand side:
def SumAndProd(a, b): return a+b, a*b
x, y = SumAndProd(2, 3)  # Assigns x=5 and y=6
A Python tuple is different from a Common Lisp multiple value because a tuple is a data object on its own. In the above example,
result = SumAndProd(2, 3)
would assign to x a tuple of (5, 6), whereas the corresponding Common Lisp expression
(setq result (sum-and-product 2 3))
would only set result to the first element of the multiple result, 5.

Python has to construct the tuple at the return site and then access the elements of the tuple at the call site to assign them to x and y. Common Lisp can can put the values directly in registers at the return site and use them directly at the call site.

The reason Common Lisp can do this is because we can have a calling convention that deals directly with multiple values. If we have, say, 4 registers dedicated to return values, the callee can return up to four values in registers. Any expression that is not expecting multiple results will only look at the first register and only use the first of the multiple results, which is what the semantics says it should do. Everything works out (you also need to return an indication of how many results were returned--I assume there is a dedicated register for this in implementations that return values in registers).

In a statically-typed language that lets us declare that a function returns a tuple of a certain length, we can define a calling convention such that a function that returns a tuple of four elements or less returns the elements in the four result registers without constructing a tuple. The call site then has to know about this calling convention and handle it appropriately. In particular, if the caller actually wants a tuple rather than the individual values, then the caller must construct the tuple from the values in the registers.

Argument Assignment as Pattern Matching

I believe I’ve seen this in another language, but I don’t recall where, so I’ll talk about this as a feature of Unobtainabol. In Unobtainabol, a tuple is a sequence of values. For example:
(‘a’, ‘b’, ‘c’, ‘d’)
represents a tuple of four elements numbered 1 to 4. The parenthesis are not required except for disambiguation.

A pattern is a tuple expression with possibly-typed variables in place of values. A pattern can be matched against a subject using the assignment statement:
a, b = 1, 2  # assigns a=1, b=2
a, b, c = 1, 2  # fails and raises an error
int a, float b = 1, 2.2  # assigns a=1, b=2.2
int a, int b = 1, 2.2  # fails and raises an error
It should be no surprise now to say that an Unobtainabol function call takes a single argument, which is a tuple, For example
f(‘a’, ‘b’, ‘c’, ‘d’)
The formal parameter list is a pattern which matches the argument tuple and assigns to the formal parameters. For example, the above call, with the function
void f(a, b, string c, string d)
would produce the formal parameter assignments: a=’a’, b=’b’, c=’c’, d=‘d’.

With suitable extensions to tuples and matching, we can handle default values, keyword arguments, variable argument lists, and the various parameter passing methods. The result is a parameter-passing mechanism that is identical to pattern-matching assignment in the language, so there is only one set of rules for the programmer to learn in order to use both features.

Dispatch as Pattern Matching

For our purposes, a function signature is the list of argument types and possibly other information about the arguments. Many languages allow polymorphic functions: multiple functions with the same name but differing signatures:
int f(int i);
int f(int i, int j);
int f(float x);
The selection of which of a set of polymorphic functions to invoke is called function dispatch.

Dispatching is a fairly complex matching problem. Frequently, more than one function signature matches the actual parameters, so there must be rules about how to choose the best match. For example, in the presence of automatic casting, the call f(0) would match both the first and third signatures above because while 0 is an int, it can be cast to a float.

A pattern-matching case is a case expression or statement that uses pattern matching rather than mere equality to resolve the condition. For example:
match x {
case (int a, int b): ...
case (int a, float b): ...
This will select the branch that best matches x and will execute the associated clause with the indicated bindings.

Some functional languages allow functions to be declared as a set of clauses like this:
factorial(0) = 1
factorial(n) |n > 0| = n * factorial(n-1)
The expression inside |...| is called a guard; it is a way to extend patterns with general conditions. The pattern only matches when the guard is true.

The function above would be equivalent to one declared with a pattern-matching case:
factorial(n) = switch n {
 case 0: 1
 case i |i > 0|: factorial(i-1)
All of this suggests the following: a collection of polymorphic functions, all with the same name can be viewed as a single function where the body is a pattern-matching case over the formal parameter lists of the individual signatures. For example, given the functions above
int f(int i) {body1}
int f(int i, int j) {body2}
int f(float x) {body3}
These would be equivalent to the single function:
int f(Args) {
 match Args {
   case (int i): {body1}
   case (int i, int j): {body2}
   case (float x): {body3}
A call to f(actual-parameters) gets resolved by setting Args=actual-parameters and then executing the switch statement. This unifies the concepts of dynamic dispatch and pattern matching. Static dispatch is still an open issue.

In Unobtainabol we take this all the way: you don’t actually have to give a formal parameters list; you can just use Args in the body of the function. Args[1] is the first actual parameter, Args[2] is the second, and so on. For extra convenients, we define $1 as an abbreviation for Args[1], $2 as an abbreviation for Args[2], and so on. This makes for a nice syntax for small closures.

Exception Handling as Pattern Matching

A continuation is a control point in the program. Technically, a continuation encodes both an address in the code and an environment (a stack frame), but when the continuation is just used as a function return and the function uses the normal stack discipline, the continuation can be encoded as just an address in the code because normal stack handling will take care of the environment. Because of this, I will sometimes conflate the notion of a continuation and a code address.

A basic function, when it has done its work just returns to the caller at the point after the call. In pseudo-assembly language it looks like this:
 push arg 1
 push arg 2
 push labelA
 goto function
First we push the arguments and the continuation/return label on the stack, then we jump to the function. When the function is done, it will pull the return label off the stack and jump to it. The stack may be altered either before the return or after, depending on the calling convention. It’s better to pass arguments in registers than on the stack, but you usually need a stack anyway so I’ll ignore registers.

In some cases, you need two continuations. For example, suppose f is a generator function and the language has a looping feature that uses a generator function like this:
for x in f(y):
 do something with x
go on with program
The code for this is pretty complicated and it doesn’t just use normal stack discipline, so I’ll spare you the details, but suffice it to say that we call f with two continuations: one for “do something with x” and one for “go on with the program”. The function will branch to the first one for each value it generates, and then branch to the second one when there are no more values.

Another use for multiple continuations is for functions that throw exceptions. One continuation would be for normal value return and another continuation for a thrown exception. Or there can be one continuation for each type of exception thrown by the function. Let’s assume a continuation for each type of exception, so
try {x = f();}
 catch Exception1 e: {handler1}
 catch Exception2 e: {handler2}
go on with program
represents a call to function f with three continuations: the first one jumps to “go on with program”. The others jump to handler1 or handler2.

Passing an arbitrarily long list of continuations could get expensive. Actually, just passing two continuations may be measurably more expensive than one, so instead of passing all of the continuations, we can pass in just one and have the others findable from the one. For the above code it would look like this:
 push return address: normal_label
 call f
 code for handler1
 code for handler2
 code for go on with program
So a normal exit just jumps to ret, the return point. A throw to handle1 jumps to the address found at ret[-8] where 8 is the word size. A throw to handle2 jumps to the address found at ret[-16].

In the normal execution case, there is practically no overhead for the existence of additional continuations. In the case of an exception, there is an extra indirection. While this technique is probably optimal for the normal case, it does have the disadvantage that it can’t pop off multiple frames at once--it has to branch to and clean up each individual stack frame from the point of the throw to the point of the exception. But I’m pretty sure most exception-handling protocols do that anyway.

Rust supports a discriminated union data type that it calls an enum. Rather than Rust syntax, which is a bit obscure, I’ll use C-like syntax and call the discriminated union a dunion instead of an enum:
dunion Foo {int Bar; float Baz;};
The typecase in Rust is a special case of their pattern-matching statement:
match foo {
 case Foo::Bar(i): code to execute when foo is an int,
 case Foo::Baz(f): code to execute when foo is a float
The name in parenthesis is assigned the unboxed value. For example if foo=5, then we will jump to the first branch and execute the body with i set to 5.

Rust error handling makes use of a discriminated union that returns either the result of the function or an error of some sort. For example, if if the normal return value is an integer, but it can throw a string exception, we might use this:
dunion Result {
   int Value,
   string Err,
The function is then:
Result f(...) {...}
and you call it in a match statement
match f(...) {
 Result::Value i: code to execute on success
 Result::Err s: code to execute on error
There is a similarity between this and the multiple-continuation example.

So why wasn’t this obvious? The reason is that with exceptions, you can have a single handler for multiple function calls:
try {
 i = f1(x);
 j = f2(y);
 k = f3(z);
catch (Exception e) {...}
All of the functions could be called with the same handler. This isn’t usually good style--it’s better to keep your exception handling with the function that can throw the exception--but it’s occasionally useful and most exception mechanisms allow something like this.

Unfortunately this doesn’t map into the discriminated-union paradigm. The innovation of Midori is to break up the monolithic exception handler into two parts: a function-return mechanism that is equivalent to returning a discriminated union, and a strictly local control structure similar to a switch statement. With Midori-style exception handling we would write the above something like this:
try {
 i = try f1(x);
 j = f2(y);
 k = try f3(z);
catch (Exception e) {handle exception}
This tells us that f1 and f3 can throw exceptions. The trys that come right before the functions will pass on an exception if the function raises one. try f1(...) is an abbreviation for code something like this:
match f1(...) {
 Result::Value(temp_i): i = temp_i; break;
 Result::Err(e): {goto handle exception}
The try with the brace after it acts similar to a switch statement where the catches replace cases.

With this change, we can replace exceptions with discriminated unions and still optimize as if it were multiple continuations. Of course you probably need for the optimizer to distinguish between discriminated unions that represent exceptions and those that don’t because the optimizations rely on the fact that one branch is a lot more likely than the others.

Still, with this change, Unobtainabol can have all the features and optimizations of a language with a complex function model, but only support non-polymorphic, single-argument functions that return a single value to a single continuation.