Saturday, May 11, 2013

the super branch

In the following I talk as if branches are strictly statements (phrases executed for their side effects that do not return values) but in many languages such as Lisp and SQL branches are expressions (phrases that return values). In Unobtainabol they are expressions also, but it is easier to talk about them as statements so I’ll keep using the statement terminology.

Branches are statements that let the user chose which of a set of statements to execute. They include the traditional if statement and case statement among others. Multibranches are branch statements that can branch more than two ways in a single statement. The traditional if statement is not a multibranch but it is often extended with an elseif (or sometimes elif) feature that turns it into a multibranch. That is, a multibranch if statement would look something like this:

if condition1 then stmt1
elseif condition2 then stmt2
elseif condition3 then stmt3
else stmtn

In the multibranch if statement, the computer goes down the list of conditions, choosing the first one that is true and executing the corresponding statement. If none of the conditions is true, then it executes stmtn.

Case statements have a less consistent form across languages. There are a wide variety of keywords used and some languages restrict the matching expressions in various ways. I’ll show a slightly modified form of the SQL version here because it is one of the least restrictive:

case expr
when match1 then stmt1
when match2 then stmt2
else stmtn

In this sort of construct, the computer evaluates expr and then goes down the list of match expression to find the first one that matches (evaluates to the same value as) expr. When it finds a match, it executes the corresponding stmt. If no match is found it executes stmtn.

Since SQL allows any expression for the match expressions, you can get the effect of a multibranch if statement by using

case true
when condition1 then stmt1
when condition2 then stmt2
else stmtn

In fact SQL supports this usage by not requiring you to write true. You can just write case and then a sequence of when clauses.

Now, Unobtainabol is not really designed with the goal of mashing as much functionality as possible into single control structures, despite what you may think after superloops. The point of superloops is that combining that functionality added a lot of features that would not be there is we did just traditional loops. With multibranches, the situation is similar. There are many types of branches, and by combining them all into one construct, we hope get a feature that is more powerful than the individual features, while still having the simple forms that users expect.

There is complication in talking about multibranches because there are at least two kinds. With a multibranch if statement, the branches have boolean expression and the one that succeeds is the one that is true. With a case statement, the branches are expressions and the one that succeeds is the one that matches the control expression. To make it easier to talk about branches in general I’m going to refer to them all generally has having conditions that succeed or fail.

Up to now, I’ve been saying that the computer goes down the conditions in order and executes the statement associated with the first successful one. That is the most common sort of multibranch but there are other possibilities. Here are some options for what order to do the testing in:

  1. the conditions are tested in the given order
  2. the conditions are tested in some arbitrary fixed order (possibly for performance reasons)
  3. the conditions are tested in parallel
  4. the conditions are searched for the “best” matching condition if several can match

Each of these evaluation methods has certain advantages. Method 1 is the simplest. Method 2 avoids putting arbitrary constraints on the compiler that the algorithm doesn’t require. Method 3 can  be much faster on certain hardware and is useful for expressing certain kinds of algorithms. Method 4 is useful for lots of things, including as a way to implement polymorphism.

In addition, there are different possibilities for which statement to execute:

  1. Execute just one statement: If some condition succeeds then execute its corresponding statement. Otherwise execute the else statement.
  2. Execute the corresponding statements of all conditions that succeed. If no condition succeeds then execute the else statement.
  3. Execute the corresponding statement of the earliest condition that succeeds and then the following statements in the given order.

In addition, there are different kinds of checks that you might want a compiler to test for you in some circumstances:

  1. Are the cases all distinct so that only one can succeed?
  2. Do the cases cover all possibilities or is it possible that none of them will succeed?

And finally, there are different kinds of matching such as unification and typecase, but that is a whole different topic that I’ll skip for now.

So here is what a multibranch might look like in the perfect programming language:

 case superbranch-body esac |
 if superbranch-body fi

branch-control branch-clause*

order-control result-control condition-control

order-control: ordered | unordered | (empty: default=ordered for if, unordered for case)

result-control: first | set-function | (empty: default=first for ordered, any for unordered)

condition-control: expression | term | (empty: default=true)

Here are some notes on the syntax:
  • A multibranch that uses the if ... fi syntax is called an if statement and one that uses the case ... esac syntax is called a case statement.
  • Case statements are checked by the compiler. If the compiler cannot prove that at least one of the conditions will succeed, then it is an error. You can always avoid this error by adding an else.
  • Case statements that do not have an explicit result-control are checked by the compiler to see if at most one condition can succeed. If the compiler cannot prove that at most one condition can succeed it is an error. You can avoid this error by adding an explicit result-control or by changing it to an if statement.
  • If there is no branch-control in a branch-body then the first when is optional. This means that you case use a more traditional syntax like this:

if expression then stmt1 else stmt fi

For a multibranch if statement, instead of elseif, you use when:

if expression then stmt
when expression then stmt
when expression then stmt
else stmt

  • else” acts exactly like “when e then” where e is some expression that always succeeds in that particular superbranch. Depending on the controls, it may be allowed and useful to have multiple else statements.
  • A set-function is a function like max, min, sum, etc. that takes a set of arguments and returns a result. The idea is inspired by SQL. Any is a branch function that just returns some arbitrary member of the set. If you use a set function that picks the “best” result in some sense and if the order-control is unordered, then the compiler can try to optimize the output by doing a direct search for the best result.
  • first is different from any or any other set function because first prevents execution of any more conditions after a success, while with a set function, all conditions would be executed. The compiler could not always optimize any in this way because if there are side effects in the conditions, it would alter the behavior.
  • With unordered and a set function, there is no restriction on the order of evaluating the conditions, and all conditions will be evaluated so this combination allows for parallel execution of the conditions. That’s why the the default result-control for unordered is any rather than first. And since the default order-control for case statement is unordered, a regular case statement with unspecified result-control and unspecified order-conrol can be executed in parallel. And of course, the compiler can also use parallel execution whenever the conditions have no side effects.

No comments:

Post a Comment