[squeak-dev] Re: Generators in Smalltalk??

Andreas Raab andreas.raab at gmx.de
Wed Feb 10 23:44:58 UTC 2010


As additional motivation, the Motivation section of PEP 225 is a great 
read to explain why Generators are interesting and what the alternatives 
are:

Abstract

     This PEP introduces the concept of generators to Python, as well
     as a new statement used in conjunction with them, the "yield"
     statement.

Motivation

     When a producer function has a hard enough job that it requires
     maintaining state between values produced, most programming languages
     offer no pleasant and efficient solution beyond adding a callback
     function to the producer's argument list, to be called with each value
     produced.

     For example, tokenize.py in the standard library takes this approach:
     the caller must pass a "tokeneater" function to tokenize(), called
     whenever tokenize() finds the next token.  This allows tokenize to be
     coded in a natural way, but programs calling tokenize are typically
     convoluted by the need to remember between callbacks which token(s)
     were seen last.  The tokeneater function in tabnanny.py is a good
     example of that, maintaining a state machine in global variables, to
     remember across callbacks what it has already seen and what it hopes to
     see next.  This was difficult to get working correctly, and is still
     difficult for people to understand.  Unfortunately, that's typical of
     this approach.

     An alternative would have been for tokenize to produce an entire parse
     of the Python program at once, in a large list.  Then tokenize clients
     could be written in a natural way, using local variables and local
     control flow (such as loops and nested if statements) to keep track of
     their state.  But this isn't practical:  programs can be very large, so
     no a priori bound can be placed on the memory needed to materialize the
     whole parse; and some tokenize clients only want to see whether
     something specific appears early in the program (e.g., a future
     statement, or, as is done in IDLE, just the first indented statement),
     and then parsing the whole program first is a severe waste of time.

     Another alternative would be to make tokenize an iterator[1],
     delivering the next token whenever its .next() method is invoked.  This
     is pleasant for the caller in the same way a large list of results
     would be, but without the memory and "what if I want to get out early?"
     drawbacks.  However, this shifts the burden on tokenize to remember
     *its* state between .next() invocations, and the reader need only
     glance at tokenize.tokenize_loop() to realize what a horrid chore that
     would be.  Or picture a recursive algorithm for producing the nodes of
     a general tree structure:  to cast that into an iterator framework
     requires removing the recursion manually and maintaining the state of
     the traversal by hand.

     A fourth option is to run the producer and consumer in separate
     threads.  This allows both to maintain their states in natural ways,
     and so is pleasant for both.  Indeed, Demo/threads/Generator.py in the
     Python source distribution provides a usable synchronized-communication
     class for doing that in a general way.  This doesn't work on platforms
     without threads, though, and is very slow on platforms that do
     (compared to what is achievable without threads).

     A final option is to use the Stackless[2][3] variant implementation of
     Python instead, which supports lightweight coroutines.  This has much
     the same programmatic benefits as the thread option, but is much more
     efficient.  However, Stackless is a controversial rethinking of the
     Python core, and it may not be possible for Jython to implement the
     same semantics.  This PEP isn't the place to debate that, so suffice it
     to say here that generators provide a useful subset of Stackless
     functionality in a way that fits easily into the current CPython
     implementation, and is believed to be relatively straightforward for
     other Python implementations.

     That exhausts the current alternatives.  Some other high-level
     languages provide pleasant solutions, notably iterators in Sather[4],
     which were inspired by iterators in CLU; and generators in Icon[5], a
     novel language where every expression "is a generator".  There are
     differences among these, but the basic idea is the same:  provide a
     kind of function that can return an intermediate result ("the next
     value") to its caller, but maintaining the function's local state so
     that the function can be resumed again right where it left off.  A
     very simple example:

         def fib():
             a, b = 0, 1
             while 1:
                 yield b
                 a, b = b, a+b

     When fib() is first invoked, it sets a to 0 and b to 1, then yields b
     back to its caller.  The caller sees 1.  When fib is resumed, from its
     point of view the yield statement is really the same as, say, a print
     statement:  fib continues after the yield with all local state intact.
     a and b then become 1 and 1, and fib loops back to the yield, yielding
     1 to its invoker.  And so on.  From fib's point of view it's just
     delivering a sequence of results, as if via callback.  But from its
     caller's point of view, the fib invocation is an iterable object that
     can be resumed at will.  As in the thread approach, this allows both
     sides to be coded in the most natural ways; but unlike the thread
     approach, this can be done efficiently and on all platforms.  Indeed,
     resuming a generator should be no more expensive than a function call.

     The same kind of approach applies to many producer/consumer functions.
     For example, tokenize.py could yield the next token instead of invoking
     a callback function with it as argument, and tokenize clients could
     iterate over the tokens in a natural way:  a Python generator is a kind
     of Python iterator[1], but of an especially powerful kind.

Cheers,
   - Andreas

Andreas Raab wrote:
> Ralph Boland wrote:
>> I do not understand all the commotion over generators.
> 
> Generators transform a push model (#do:) into a pull model (#next). 
> Let's say you have two algorithms one that produces values via a push 
> model and one that consumes values via a pull model:
> 
> producer := SomeAlgorithm new.
> producer do:[:value| "produces the value from the algorithm"].
> 
> consumer := AnotherAlgoritm new.
> consumer process: inputStream. "consumes the value from inputStream"
> 
> Plugging these two together can be a royal pain. You basically have the 
> following three options:
> 
> 1) Preproduce all the values from the producer and create a stream that 
> holds them:
> 
> values := Array streamContents:[:s| producer do:[:each| s nextPut: each]].
> consumer process: values readStream.
> 
> This can work very well if the number of values is limited and if you 
> know you're going to consume all the produced values (doesn't work that 
> well with infinite sequences of numbers).
> 
> 2) Rewrite either one of the algorithms. Can work very well if it's your 
> own code and reasonably simple, and can be extremely painful if it's 
> complex.
> 
> 3) Use a Generator. With a generator, the solution becomes simply:
> 
> consumer process: (Generator on:[:yield| producer do: yield]).
> 
> When you can neither use solution 1) or 2) generators simplify your life 
>  *greatly*. In fact I'd say they are indispensable if you're dealing 
> with a complex existing environment.
> 
>> If a Collection class cannot do what you want, use a Stream.
>> Take a read only Stream and subclass from it a class called GenMyData.
>> Inside GenMyData write code that generates the data you
>> need on the fly and feeds it to the user upon request using
>> the ReadStream interface.
> 
> See above. You're describing option #2. Rewriting an existing algorithm 
> to "feed" the data properly can be decidedly non-trivial. You can start 
> with Integer>>primesUpTo:do: which is a very simple algorithm. The point 
> being that while all algorithms can be expressed either way the generic 
> way in which a Generator does that replaces the need to reimplement each 
> specific algorithm.
> 
> Cheers,
>   - Andreas
> 
> 




More information about the Squeak-dev mailing list