[DOC] What is a continuation?

Scott A Crosby crosby at qwes.math.cmu.edu
Fri Feb 1 11:17:54 UTC 2002


On Fri, 1 Feb 2002, Anthony Hannan wrote:

> Scott A Crosby <crosby at qwes.math.cmu.edu> wrote:
> > BTW, what about tailcall out of a block?
> >
> > foo:
> >   ^[^self bar]
> >
> > Is that a different bytecode?
>
> Tail call out of a block would be harder because the caller's receiver
> (the block) (which we are replacing) holds the return context and
> callee's (#bar) localReturns would have to be changed to a remoteReturns.
> So a new method would have to be made on the fly and the receiver
> would have to be wrapped in a new block closure with the return context.

I don't know the intricities and terminology of the method context and
invocation mechanism; I'm not following this.

Ah. Ha.. It took me two hours hours, but I think we're confusing two
distinct things....

[ foo foo foo. ^self someMessage]
      AND
[ bar bar bar. self someMessage]

Which are *very very* different things.

What it means to 'tailcall' for the first seems messy. I don't know? Punt?
Define some sort of new semantics for it? Use a normal method call? What
do you think?

What it means to tailcall for the second is clear. A block returns the
value of the last thing in the block, whether that is a value or a message
send. If its a messagesend, it may be tailcalled, furthermore, there are
no 'which context does it return into' issues. Finally, since we only care
about the last method call, we're already done with using any variables in
the context, We'll have popped any that we'll pass to the function we're
invoking by this point, before we call out, so we can nuke the context
(the one inside of [^^self bar]) safetly?

>
> Is tail recursion required for continuations?  I really don't understand
> continuations, can you give me some pointers?
>

Tail recursion is not needed for continuations. You only need closures,
but without tail recursions, continuations-style programming can overflow
the stack much easier.

Continuations are basically when you have a 'what do I do next' argument.
Normally, thats implicit, but not always. A stack of frames is a
continuation.. Or, an error handler is a continuation.

You can think of each method in squeak currently getting 2 extra
arguments, a return continuation (invoked on your return value), and an
error continuation. (which is invoked on an error)

Sometimes having them passed explicitly is also good. For example, assume
I have a bunch of functions trying to do a proof search (solve a
problem).. But, they can fail, and changes may need to be unwound, so
handle this with a failure continuation.

Sorta like:

Foo>>match: aStream ifFailure: failCont
     pos _ stream position
     ....
     stream pop.
     ....
     " Used to restore the state if things fail "
     ^^subproblem2 match: aStream ifFailure: [aStream position: pos.
                                            ^^failCont value]
--
Or, you want to try something, but also maybe try something else if the
first fails.

--
Bar>>match: aStream ifFailure: failCont
     pos _ stream position
     ....
     subproblem1 match: aStream ifFailure: failCont.
     ...
     stream pop.
     ...
     " Used to try an alternative. "
     ^^subproblem2 match: aStream
                 ifFailure: [aStream position: pos.
                               ... the stuff for the alternative ...]

--

Note that in the first example, without tail recursion, you pay one stack
frame for each element on the stream.

You can think of continuations as being the abstract represntation of the
flow of control, 'what to do next'. In many cases, existing implicit
flow-control continuations may be coopted as continuations. The first
example, for example, could be done by throwing exceptions. (and using the
implicit error continuation.)

In fact, anytime you have a stack of stuff to keep track of 'what to do
next', you're dealing with (a concrete representation of) continuations
without even realizing it. That applies to the function-call stack, the
error handler stack.

Here's some code, written in SML-NJ that implements a complete regexp
matcher. It defines the regular expressions at the top, and the engine at
the bottom. This language uses short-circuting boolean connectives and
pattern-match on arguments.  'head::tail' is used to patternmatch the
head&tail of a list.

--

datatype regexp =
    Char of char
  | Times of regexp * regexp
  | One
  | Plus of regexp * regexp
  | Zero
  | Star of regexp
  | Underscore
  | And of regexp * regexp
  | Top
  | Not of regexp;

(* val acc : regExp -> char list -> (char list -> bool) -> bool *)
fun acc (Char(a)) (a1::s) k = if (a = a1) then k s else false
  | acc (Char(a)) (nil) k = false
  | acc (Times(r1,r2)) s k = acc r1 s (fn s' => acc r2 s' k)
  | acc (One) s k = k s
  | acc (Plus(r1,r2)) s k = acc r1 s k orelse acc r2 s k
  | acc (Zero) s k = false
  | acc (r as Star(r1)) s k =
       k s orelse
       acc r1 s (fn s' => if s = s' then false
                          else acc r s' k)

  | acc (Underscore) (a1::s) k = k s
  | acc (Underscore) (nil) k   = false

  | acc (And(r1,r2)) s k = acc r1 s k andalso acc r2 s k
  | acc (Top) (nil) k = k nil
  | acc (Top) (a1::rest) k = k (a1::rest) orelse acc Top rest k
  | acc (Not(r)) s k = if (acc r s k)
            then false
            else true;

--
You can think of the regexp engine as getting 3 arguments:
  The 'self' indicating the type of regexp node.
  A stream of characters.
  A continuation (block) that, when given a stream it returns a boolean
     indicating if there is a match on the 'rest' of the regular
     expression. The continuation is used to 'thread' the regular
     expression through the string.

The matcher and continuation indicates true/false on whether the string is
matched. Note that in several of the rules (Times, Star), the continuation
is actually a closure over parts of the regular expression.

The control flow is roughly:
   If you consume no input and match, just invoke the continuation on the
input and return that; you match if the 'rest' matches.
   If you are simple and match, invoke the continuation on the rest
of the input and return that value; the regexp is matched if the rest of
the string is accepted by the continuation (Char, One, Underscore)
   If you are simple and don't match, return false. (Char, Zero,
Underscrore)
   If you might match based on whether one or more regular expression
match, you try to accept on one regular expression, and set up a new
continuation which will be invoked only if that other regular exp matches.
(Times, Star)

Anyways, with tail recursion, this matcher consumes stack space
proprortional to the complexity of the regexp, not the size of the input
string. If it weren't tail recursive, it wouldn't have this property. (It
consumes one methodcontext per charater in the input string)


Continuations, are one of those things (like pointers or closures) that
seems wierd and confusing until you get it, but when you do, its a
revelation. You never view programming quite the same way again.

With BC, the regexp example could be coded up in smalltalk. Adding on a
way to extract matching substrings is trivial, and you'd have a extremly
featureful regexp library in about 25 lines of code.

In fact, I suggest everyone try it, just get an idea of continuations,
then try running it and seeing the beauty of the algorithm. :)

And with tailcall, you won't exhaust all RAM trying to match a megabyte
string. Finally, bonus points if anyone can tell me the difference between
Times and And.

Scott




More information about the Squeak-dev mailing list