Friday, May 17, 2013

Assertions and Assumptions


An assertion is a boolean expression that asserts that some condition is true at a point in the program. For example:

assert x > 0;
do while x > 0;
 ...
od

This piece of code has a while loop and it asserts that just before entry of the loop, x will be greater than 0, which implies that the loop always executes at least once. The assertion should be understood as a claim by the programmer that the condition is true, along with some sort of verification by the language that the assumption is correct.

Typically the assertion is verified at runtime by a runtime test. This test will slow down the program, so it is also typical to allow these tests to be turned off for an optimized build. Unotbainabol ought to do a better job than that. If the programmer has some sort of proof in mind that x > 0 at that point in the program then why can’t the language figure it out at compile time? Of course in the real world there are limits to what can be done with automated theorem proving, but even in the real word compilers ought to do better than they do.

Automatic Verification

One way to do a better job of proving assertions at compile time is to get help from the programmer. The compiler should be able to ask the programmer to explain his reasoning and then check the explanation. This might involve an iterative process where the compiler would try to compile a program with assertions and then output warnings for each one that it can’t prove. The programmer would then add more assertions that would enable the compiler to follow his reasoning.

For example assume the code fragment above appears in this function:

function f(x: int)
 x := x+1;
 assert x > 0;
 do while x > 0;
   ...
 od
end

The compiler would attempt to compile the above code and produce a warning: “file foo, line 25: can’t prove assertion x > 0”.

We could then go in and add a precondition to the function:

function f(x: int)
 precondition x >= 0;
 x := x+1;
 assert x > 0;
 do while x > 0;
   ...
 od
end

A precondition is a special assertion that goes at the beginning of a function. When we add this condition, we are hoping that the compiler is smart enough to figure out that if you start in a state where x >= 0 and then you add one to x, then in the new state, x > 0. This may seem pretty trivial, but it’s a very special case. In general proving even simple things about arithmetic is quite difficult. But since, Unobtainabol is an ideal language rather than a real language, this is not a problem; it is a research opportunity.

So why did we use the keyword “precondition” rather than “assert”? To see what is special about preconditions, we can do another build and get the warning: “file bar, line 122: can’t prove precondition x >= 0 on function f”. What’s going on here? The function f is defined in file foo. How does adding a precondition to function f create a warning in a completely different file, bar?

Well, we look at the line with the warning and we see this:

f(y);

The precondition on function f gets tested, not in the function, but at the point of the call. The compiler sees a condition x >= 0 on the formal parameter x and it tests that condition on the actual parameter y. The test fails because the compiler can’t prove that y >= 0. Preconditions make it possible for the compiler to prove conditions across function calls without doing interprocedural analysis.

So let’s say that the function call occurs in this sequence:

y := g(z);
f(y);

We can fix the warning if we have a way to say that g always returns a nonnegative number. We do this with a postcondition on g:

function g(z)
 postcondition result >= 0;
 ...
end

A postcondition, like a precondition, gets evaluated at the point of the call. The name “result” is a special variable that we can use in postconditions to assert things about the return value of a function.

Checks and Assumptions

One problem with the iterative process described above is that in some cases it may prove too much trouble. The programmer may want just a C-style assert that is checked during testing and then assumed to be true during production runs. In other cases, the programmer may want the condition to always be tested, even in production.

For these purposes, Unobtainabol has two new forms of assert statements. An assertion followed by assume tells the compiler to simply assume that the condition is true, but to add a runtime check for debug code if the condition can’t be proven. An assertion followed by check tells the compiler to always verify the condition in both debug and production code if the compiler can’t verify the condition at compile-time. Here are some examples:

precondition check x>0;
postcondition assume x>0;
assert check x>0;

Optimization Based on Assertions

Let’s look at that original assertion again:

assert x > 0;
do while x > 0;
 ...
od

What is interesting about this example is that if the compiler knows that x > 0, then it ought to be able to skip the test the first time through the loop. In fact, the compiler ought to be able to change this loop into a test-at-end loop.

I don’t know of any real-world languages where this is done because in real languages assertions are viewed as either runtime tests (in debugging code) or as non-entities (in optimized code). In order for the compiler to take advantage of this optimization opportunity, it would have to know that the condition is true either by proving it or because the programmer used assume.

It might seem reckless to let the compiler do optimizations based on assume assertions because the programmer might be wrong. However if the programmer is wrong, then the program is not working correctly anyway, so the danger is not all that much greater.

However, the danger is greater in an important way that has to do with undefined behavior. Undefined behavior is what you get when the language does not specify what happens in certain circumstances. In the C library standard, for example, it specifies that if you call free() twice on the same pointer without getting it back from malloc() in between, then the result is undefined. The language doesn’t say what will happen. Maybe the system will catch the error and give you a nice error message. Maybe the program will immediately crash or go into an infinite loop. Maybe everything will seem to be working but you get mysteriously wrong answers out. You just don’t know.

In higher-level languages there is no undefined behavior. Everything has a defined result, even errors. This is, in fact, a big part of the reason why higher-level languages tend to be slower than lower-level languages. There are always runtime tests needed to make sure that nothing undefined happens.

Now, if the compiler optimizes based on an assumption and that assumption turns out to be false, then sometimes the result will be undefined. There is generally no way to add tests to control what happens in this case because if you put in runtime tests to verify the assumption, then it defeats the purpose of the optimization.

So the question on whether to optimize based on assumptions depends on whether Unobtainabol will have any undefined behavior or not --and that’s a good question with arguments on both sides. I think the perfect language should fulfill both the performance needs of those who want undefined behavior and the security needs of those who do not. For this reason, Unobtainabol should have a compile-time option to say whether it will have any undefined behavior or not. Each compiled module must have a flag saying whether it has undefined behavior so the decision of whether to link that module can be made based on the build flags.

In general, we might want to have two versions of some libraries where the presence of undefined behavior can make a significant performance difference.

No comments:

Post a Comment