Saturday, May 4, 2013

Parameter passing II


In the previous section I went through a sequence of parameter passing methods giving examples to show how, for each one, that method did not have the effect that the programmer would expect. Those examples assumed that the programmer just wanted to write a simple function that would calculate a result from two arguments. It turned out that only pass-by-value would allow writing such routines.

However there are other kinds of routines that programmers often want to write and for those routines, pass-by-value is often inadequate. In this section I’m going to go back up through the sequence, showing at each level that the method is not powerful enough for a kind of routine that a programmer may want to create.

First, let’s talk about why this list of parameter passing methods is a sequence and not just a set. The sequence is arranged by how much control the method has over the processing of the parameters vs. how much is handled automatically before the call. In a sense, this is a sequence of how powerful each parameter-passing method is. One thing we can learn from this sequence is to distinguish the phases that a translation engine typically goes through in order to read and evaluate a programming language phrase.

Steps in processing a phrase

Reading

Reading: The phrase is first read in as a sequence of letters, digits, and other characters. For example:

x+y*2

In the above text the computer doesn’t know anything about structure. It doesn’t know that x is a name or that + is an operator. It doesn’t know that * happens before +. All it knows is the text.

Parsing

To generate all of the missing information, the computer has to parse the program to produce an abstract syntax tree or AST. The string above would typically produce an AST like this:


The abstract syntax tree encodes the syntactic structure of a phrase, picking out syntactic details such as operator calls and names. However, parsing does not include the bindings of the names.

Binding

The next step after parsing is name resolution or name binding. After this step, the names have meanings or denotations. In the case of variables, of course, the name denotes a location which can vary at runtime. Don’t confuse binding with dereferencing. A variable is bound to a location (which it denotes). The location contains a value. To dereference a variable, you take the value out of the location.

Resolving the names of routines is complicated by polymorphism so for now when I talk about name binding, assume that I’m only talking about the names of variables.

Note that binding a variable doesn’t mean that you reduce it to a single instance. You can bind a local variable, meaning that you give it a specific meaning within the namespace of a routine, but each time you call the routine, you have a different instance of the variable.

Evaluation

After all names in a phrase are bound, you can evaluate it to find out what it denotes. In the case of an expression, the phrase will denote a location or a value, depending on whether the expression is variable. In the case of a statement, the phrase will denote a state-change in the underlying machine model. More on this when I get to denotational semantics.

Dereferencing

If an expression denotes a location, then there is one more step after evaluation. You take the value out of the location to determine what the expression refers to. In the case of non-expression phrases and of expressions that denote values rather than locations a phrase refers to the same thing that it denotes, so this step does not apply.

Parameter-passing methods in sequence

So, with the background of steps of processing a phrase, the sequence of parameter passing methods can be described like this:
  1. pass-by-text: no processing is done on the actual parameters other than reading in the text, not even parsing. All further processing is done in the routine.
  2. pass-by-phrase: the only automatic processing done on the actual parameters is parsing and creation of an AST or other internal representation of a phrase. All other processing, including name binding is done in the routine.
  3. pass-by-thunk: the actuals are parsed and names are bound on the actual parameters, but the expressions are evaluated in the routine.
  4. pass-by-lazy-thunk: The automatic processing parses the actual parameters, binds their names, and encapsulates the parameters into an evaluate-once lazy thunk. The routine controls whether and when the parameter is evaluated, but can only do it once.
  5. pass-by-reference: Automatic processing will parse, bind, and evaluate the actual parameters, but not dereference them. Dereferencing is controlled by the routine.
  6. pass-by-value: Automatic processing will parse, bind, evaluate, and dereference the actual parameters. The routine has no control at all except for using the values.
Looked at this way, it is easy to see why there are things you can do with the lower-numbered methods that you can’t do with the higher-numbered methods.

But what kinds of things should a language allow the programmer to do? What kinds of routines or language extensions should be allowed? If you want a very powerful and expressive language then a good rule of thumb is that if a language has some identifiable feature, then programmers ought to be able to manufacture their own versions of such features. After all, the language designer thought some variety of that feature was a good idea but could hardly expect to handle all useful varieties of the feature.

For example any language that provides built-in functions should allow programmers to write their own functions as extensions. Any language that provides operators should allow programmers to write user-defined operators. Any language that provides assignment should allow programmers to write extensions that modify their arguments... and so on. I will exploit that rule of thumb in the following.

Pass-by-reference

Most languages have an assignment operation that takes two arguments and assigns one to the other. Some language have additional assignments such as the operation-and-assign statements in C. In such a language, it ought to be possible for users to write their own versions of assignment statements.

You cannot write an assignment statement with pass-by-value because the arguments are dereferenced at the call site and the routine has no access to the locations. However, you can write assignment statements with pass-by-reference. Here is a simple example that takes two variables and swaps their values:

function swap(ref x, ref y) var tmp = x; x = y; y = x; end

Pass-by-lazy-thunk

Many language have short-circuit operators that only evaluate their arguments when needed. For example in the C expression, ptr && ptr->x==0, the second expression will only be evaluated if the first expression is true. This isn’t just a matter of efficiency; in the example above, if ptr evaluates to false, then ptr->x will be an undefined reference and it would be an error to evaluate it.

Well, if short-circuit evaluation is important enough that the language designer used it, then he ought to let the programmer use it as well. You can’t write short-circuit routines with pass-by-value or pass-by reference because they evaluate their arguments at the call site before the routine gets control. However, you can do it with pass-by-lazy-thunk:

function lazy-multiply(x, lazy y)
 if x == 0 then return 0;
 return x/y;
end

Pass-by-thunk

Many language also have control structures such as while loops or if-then-else conditionals. These are features that control the evaluation of their arguments. You can implement control structures with pass-by-thunk. Here is how you would implement a while loop:

function whileLoop{thunk condition, thunk statements}
if !condition then return;
statements;
whileLoop{condition, statements};
end

Pass-by-phrase

Up to now, I’ve been restricting the discussion of parameter passing to parameters --oddly enough-- but similar issues apply for the return value. In effect, a routine with pass-by-phrase and return-by-phrase is the same as a Common Lisp macro. A macro call is a phrase that takes a set of phrases as arguments and returns a phrase that replaces the call. Binding and all subsequent valuation happens on the resulting phrase rather than on the call.

Notice that for this to work, the processing of a pass-by-phrase function must happen before name binding, which means before runtime. This kind of language facility tends to confuse the common meanings of compile-time and runtime, so when confusion is possible I will sometimes refer to translation time rather than compile time. Compile time is usually thought of as something that applies to the whole program and happens before runtime. By contrast, translation time applies to individual phrases and can happen during runtime of an outer program.

These sorts of macros are very powerful, but they make a complex topic so I think it’s a little early to give a concrete example of using one. Later I’ll write a post giving the details of pass-by-phrase routines and macros.

Pass-by-text

If pass-by-phrase is the mechanism of Common Lisp macros, then pass-by-text is the mechanism of C macros. Although it has serious problems, the C macro facility has proven extremely powerful. One thing you can do with pass-by-text that you cannot do with pass-by-phrase is to create new sorts of literals. For example, suppose your language does not have a DateTime data type. In many languages you can write a class or other sort of abstract data type to implement DateTime, but what you cannot do in most languages is create DateTime literals such 1985-01-01T05:00. You can simulate with a function call such as

datetime(“1985-01-01T05:00”)

but this is more verbose and is often not understood by the compiler for constant folding and other compile-time optimizations.

I should mention that Common Lisp does have a way to create literals of this type, but it is not considered a routine. In Common Lisp, you can essentially change the lexical tokenizer.

Again, there is a lot to discuss with this kind of marco so I’ll avoid writing out a specific example until a later post that can concentrate on the topic.

Non-conclusion

So where does all of this leave us? We know that Unobtainabol has to have some way to get all of the above forms of parameter passing, or at least a way to get the same effects. But there are a lot more things to consider before we come up with a specific parameter-passing feature for Unobtainabol: Do we really want to allow aliasing as pass-by-reference does? How does the kind of argument change the properties of parameter passing? For example, is passing an integer different from passing an object? Is parameter passing different for remote procedure calls? What can you do with macros? What is the relationship between parameter passing and assignment? What is the relationship between formal parameters and the signature of a routine? Is the number of parameters part of the signature of the routine or is that part of the optional declarations? Is currying allowed? How does it work?

For these and other exciting issues from the world of parameters, see future posts.

No comments:

Post a Comment