FilterStreams prototype

Sat Apr 3 16:34:40 UTC 1999

>Hello Squeakers,
>
>in an earlier message ( Musings about "ma" ) I mentioned a simple OO
>pipe-and-filter implementation.  I have now finally started porting
>this stuff from Objective-C to Squeak.  Do you think this is
>worthwhile pursuing further or just another crazy idea? :-)

Look, we're all nuts, so don't consider that the notions of "crazy 
idea" and "worthwhile pursuing" are mutually exclusive. :-)

>Motivation and Rationale
>------------------------
>
>The motivation for this package was the lack of reusability of
>printing and other encoding tasks.  The problem is that each method
>taking part in an encoding task hard-codes the specific encoding
>task, for example a collection receiving a #printOn: message sends
>#printOn: to its elements.
>
>The problem becomes apparent when trying to implement a #debugOn:
>message that only differs from #printOn: for a couple of objects.  At
>first, this seems easy, defining #debugOn: as a rename of #printOn:
>in Object, and implementing #debugOn: for those objects that should
>behave differently, let's say Forms.  However, this fails as soon as
>there are nested objects to debug, for example a collection of Forms.
> The collection receives the #debugOn: message, which is translated
>to #printOn: and this in turn sends #printOn: to all the Forms, not
>#debugOn: !

This was similar to the bug I found and fixed in the printOut 
mechanism that kept HtmlFileStreams from working properly.

>On an abstract level, this could be solved by adding 'subclassing'
>of operations that works just like subclassing of objects with a
>similar treatment of self (in this case, the current operation).

I disagree -- I think subclassing doesn't work well at all in 
facilitating a reusable filter-and-pipeline model.  It was precisely 
this concern that lead to the HtmlFileStream bug I mentioned earlier! 
A filter pipeline implemented by subclassing isn't terribly reusable 
because each element is "stuck" in its hierarchy -- it can't easily 
be reused "inserted" into another pipeline.  The primary virtue of 
pipes and filters is the ability to develop small hunks of code that 
do "one thing well," changing inputs into ouputs (or generating 
outputs on demand from inputs), and which can be composed into bigger 
programs in the archetypal Unix model.

The real problem with subclassing implementations of pipe and filters 
is that the hierarchy elements must know about each other to avoid 
"restarting" the pipeline.  Here's an example from HtmlFileStream.

The idea was to subclass FileStream, but to convert all text coming 
in into HTML on the fly, changing HTML command codes into 
corresponding HTML codes to generate literal text, and the like.  A 
separate facility was created to permit "quoting" of input, using a 
method called #verbatim:, so HTML commands could be interspersed with 
the output.  #verbatim was simply implemented, thusly:

verbatim: aString
	"Put out the string without HTML conversion."

	super verbatim: aString

Relying upon a similar facility in the superClass.  Unfortunately for 
HTMLFileStream, "verbatim:" there was implemented by the message 
"self nextPutAll:", which sent the message to 
HTMLFileStream>>nextPutAll: and not to the version in the superclass.

Once identified, the bug was trivially easy to fix, but it was a 
trick to identify, mostly because you couldn't debug HTMLFileStream 
without understanding its place in the hierarchy.  This is not the 
way a filter should be developed.  Filters should do one thing well, 
and know nothing about their environment except for the fact that it 
takes some input from an input and changes it to produce some output 
from an output.

Filters appear much better adapted by means of a Composition.

Thoughts on Marcel's solution.

I recognize that I am looking at a slightly different problem than 
Marcel's, but I fear his solution does not scale well for Smalltalk 
-- at least to reach the general classes of problems he is 
suggesting.  In particular, since each new FileStream must create a 
message in Object, this tends to make things highly non-reusable, and 
invites substantial name conflicts and entanglements that would make 
it wholly unsuitable as a general purpose tool for combining 
Ritchie-esque tools.

Since FilterStream isn't a Stream, as that term is understood in 
Squeak, since it is not a subclass of Stream and doesn't follow the 
protocols of Stream, so it probably shouldn't be called 
"FilterStream."

In particular, I am also not thrilled with the idea of a new 
operation #write: used in the context of something called 
FilterStream.  As I saw it, a FilterStream class would have instances 
that represent a filter pipeline -- sort of a super-Stream.  Each 
node of the pipeline would implement a Ritchie-esque 
do-one-thing-well filter, knowing only that it would have two 
objects, inStream and outStream, which it could treat using 
traditional Stream protocols #next and #nextPut, and their progeny. 
The FilterStream would handle the mechanics of coroutine management 
and buffering throughout the pipeline, and would permit both 
push-based and pull-based pipelining.

Thus, I saw the notion of a filter pipeline as an independent 
instance of a collection of objects, building on the Stream 
protocols, rather than building the pipeline into the Object 
hierarchy itself.  Marcel's solution seems, to me, to exacerbate 
rather than facilitate difficulties of reusing such filters.

I accept, however, that it is likely that I am just not "getting it," 
or that Marcel is solving an entirely different problem (apparently 
dealing with reflective properties of Smalltalk) that MUST be built 
into an object hierarchy to work.  To my mind, however, it would be 
dangerous and confusing to call such a thing a "FilterStream."