[squeak-dev] Re: Generators in Smalltalk??

Ralph Boland rpboland at gmail.com
Wed Feb 10 20:04:28 UTC 2010


(Warning, long post)

I do not understand all the commotion over generators.

If a Collection class cannot do what you want, use a Stream.
Take a read only Stream and subclass from it a class called GenMyData.
Inside GenMyData write code that generates the data you
need on the fly and feeds it to the user upon request using
the ReadStream interface.
Perhaps there should be an Abstract class "Generator" that
is a subclass of Stream to provide some generic generator functionality.

What exactly do Python generators provide that my proposal does not?
And is not my interface cleaner?
And more adaptable?

We could generalize this idea to that of a Pipe.  A Pipe would be
a generalization (subclass) of a  ReadWriteStream.  A Pipe can accept
data using the WriteStream interface.  The Pipe
can then process the data in some way and then feed the result
to the user upon request using the ReadStream interface.
A pipe is more complicated than a generator because it needs to
be able to block.  If data is being fed into it but not being taken out
then eventually it may fill up.  Similarly, if data is being removed
from it, then it may run out of data and again need to block.
Blocking may mean a process blocks or it may mean sending a message
to another pipe to send or accept data.  I have not worked out the details.
Given how useful pipes are in Linux, I think a Generic Pipe class
would be VERY useful.  Of particular use would be operators such
as '<<',   '||',  '>>', and  tee.  These operators would provide functionality
similar to what  Linux does with pipes.

The '<<' operator would pass the argument (a ReadStream, Generator,
or Pipe)  to the receiver which would use it as if it were a  ReadStream.

The '>>'  operator would pass the argument (a WriteStream or Pipe)
to the receiver which would use it as if it were a WriteStream.
I guess we need another concept here, the reverse of a generator, which accepts
data like a WriteStream, does some processing with the data, and keeps
the result.
Let's call this an Absorber.

The '||'  operator (assuming that receiver is a ReadStream, Generator,
or Pipe) requires that the argument to be a WriteStream, Absorber, or Pipe.
The result of the operation is an Absorber, a Generator, or a Pipe depending
upon the classes of the receiver and operand.
If the receiver is a Generator (or ReadStream)
and the argument is an Absorber (or WriteSteam)
then the actual calculation is triggered and the result is an Absorber
(or WriteStream) which holds the final result of the computation.
Actually, whenever the first argument is a Generator (or ReadStream), the
computation could be triggered but it would eventually block if the second
argument is a Pipe.

The Tee class is a generalization (subclass) of class Pipe. it is
instanciated with a
collection argument where the elements of the collection are WriteStreams,
Absorbers, Pipes, or Tees.  The Tee class will forward the results of its
computation to each of the elements of the collection.
I suppose there could be the reverse of a Tee, ReverseTee,
which accepts input from any of a collection of ReadStreams, Pipes, or
ReverseTees.
ReadStreams, Generators, and Pipes have a method  tee: aCollection
which creates a Tee instance and connects it to the receiver.

I am sure there are numerous objects that make sense as Generators, Absorbers
and Pipes.   For example Random number generators should be Generators.
Wow! Wasn't that insightful!

e.g. An instances of class Prime could be a Generator from which prime numbers
can be fetched. It would store the first 10,000 primes, say,
and generate further primes requested on demand.

e.g.  Class PrimeFactors could be instanciated with a number which the instance
factors.  The instance would be a Generator from which can be fetched
the factors of the initial number in sorted order.

e.g.  An instance of Class Sum could be an Absorber that stores the sum of the
objects (numbers, vectors, matrices, etc.) fed to it.

e.g.  An instance of  Class TextToLines  could be a Pipe which acts like a
WriteSteam of text and a ReadStream of lines of text.
That is, it is fed text one character at a time and
the same text can be fetched from it one line at a time.
Is that is confusing? I think some terminology needs to be standardized
here so we know what we are talking about.

I have always wanted to implement Pipes in Smalltalk but have never found the
time.  I think it could be a fun project and VERY useful.


Exercise:  Write code in Smalltalk to copy a file to another file assuming the
first file exists and is readable and the second file can be created
and written.
That is, ignore any error possibilities.
Then do the same exercise using a pipe. (Don't implement the '||' operator,
we only want to look at the code to see which is simpler.

Bugs.
You could create an infinite loop by linking  a sequence of pipes into a loop.
Is this always a bug?
How would the computation be triggered in this case?

Is the hierarchy all wrong?
Should  Absorber and Generator be subclasses of Pipe (i.e. pipes with
some functionality removed?) and ReadStreams and WriteStreams be
subclasses of Generator and Absorber respectively.  This is reopening
the controversy of what is the best hierarchy for stream only worse. :-(
Pragmatically speaking, I think  Absorbers, Generators, and Pipes, should
have a separate hierarchy than Streams but, of course, share interface.

So whatayaall think?

Frankly I think the biggest problem is that pipes would be altogether too
convenient;  they would encourage writing code that is easy to write but
have too high an overhead because of the need to deal with blocking.
That is, pipes require good judgment, or that performance not be an issue.
If your code is too slow and some pipes you are using is the problem then
rewrite that code, perhaps avoiding the pipes.

While I do not have the time to do this project on my own
I could probably spare some time to help out if a group
could form to do this.  Besides I am sure other could add much insight.

Note that this project involves more than creating the Abstract classes
Generator, Absorber and Pipe.  These three classes would be the
head of Class hierarchies just like Stream is.
I don't know what the subclasses would be though.
Pipe might have a subclass BufferedPipe.

Also, I hasten to point out that many classes would simply implement the
Pipe, Generator, or Absorber interface and not actually be subclasses
of either of these classes.

Interesting question:  Are there classes in Squeak that implement the
ReadStream, WriteStream, or ReadWriteStream interfaces now but are not
a subclasses of these classes?
If not, should there be?
That is, are there classes in Squeak that should implement these interfaces?
In the Java world, Pipe, Generator, and Absorber would be Interfaces.
I don't like Java much but Interfaces I liked.


Regards, to those of you still here.

Ralph Boland




-- 
Had a wife and kids in Florida, Jack (Nicklaus)
Went out for a "ride" but I couldn't get back
As a ten timer being hunted down
OR
When the wife found out who I was knowing (biblically speaking)
I hit a hydrant and I just kept going (till I hit a tree).
...



More information about the Squeak-dev mailing list