Hello Squeakers,
in an earlier message ( Musings about "ma" ) I mentioned a simple OO pipe-and-filter implementation. I have now finally started porting this stuff from Objective-C to Squeak. Do you think this is worthwhile pursuing further or just another crazy idea? :-)
Motivation and Rationale ------------------------
The motivation for this package was the lack of reusability of printing and other encoding tasks. The problem is that each method taking part in an encoding task hard-codes the specific encoding task, for example a collection receiving a #printOn: message sends #printOn: to its elements.
The problem becomes apparent when trying to implement a #debugOn: message that only differs from #printOn: for a couple of objects. At first, this seems easy, defining #debugOn: as a rename of #printOn: in Object, and implementing #debugOn: for those objects that should behave differently, let's say Forms. However, this fails as soon as there are nested objects to debug, for example a collection of Forms. The collection receives the #debugOn: message, which is translated to #printOn: and this in turn sends #printOn: to all the Forms, not #debugOn: !
On an abstract level, this could be solved by adding 'subclassing' of operations that works just like subclassing of objects with a similar treatment of self (in this case, the current operation).
With the language as it is now, this feature can be simulated by letting the stream argument that is constant in these operations also carry the encoding message.
Overview --------
The basic FilterStream interface is the #write: message to which it responds by passing the argument of the #write: message to its target using another #write: message. This makes FilterStreams composable but doesn't actually accomplish anything.
To accomplish its processing, each FilterStream sends a processing message to the objects it receives with itself as the argument, expecting the object to write itself on the stream in an appropriate encoding. Each FilterStream subclass defines its own processing message. As a matter of fact the default action described above (forwarding the argument to the target) is actually accomplished indirectly using a filter message defined for Objects and basic FilterStreams.
Status ------
This is the first prototype implementation. The basic skeleton is there and there is an implementation of a PrintStream that could replace #printOn:. Currently, it is implemented with a #printOnStream: message paralleling the current implementation because I am not quite ready for open-heart surgery yet... :-)
There is some code interfacing FilterStreams with the current Streams and Collections, but that isn't fully thought out yet.
Using FilterStreams -------------------
You get a filter stream by sending its class the 'stream' message. You write any object(s) by sending: filter write:object. You get its results by sending: result := filter contents.
Nesting:
You can nest FilterStreams using the #stream: creation message with another FilterStream. #contents will be passed through to the final FilterStream.
Defining new FilterStreams --------------------------
- Subclass the FilterStream that matches your new functionality most closely - In your new FilterStream, implement a class method called #filterSelector that defines the a single argument filterMessage your filter will send. - In class Object, (category 'filter streaming') implement the cover method for your filterMessage. This simply send your superclass's filterSelector. - define an implementation of your filterMessage for every class you want to behave differently for your FilterStream compared to your FilterStream's superclass.
Important: when implementing your filterMessage, use #write: to write subordinate objects to the stream, do not use the filterMessage! This ensures that your method will be reusable by future subclasses.
Applications ------------
All sorts of Byte-Encoding tasks, including #printOn:, #storeOn:, HTML and XML encoding, argument marshalling etc. All these operations could now share common code.
Other processing and encoding tasks. Morphic Canvases, for example, would make a fine FilterStream, with the inherent double-dispatching allowing both the Canvas and the graphical objects to react specially. I've also implemented an HTML processing framework using this technique.
Future Directions -----------------
- Finish and polish the implementation. - Replace current coding mechanisms with FilterStreams. - Better integration with the current Streams mechanism. - Implement read capabilities (co-routines and/or buffering)
- FilterStreams are really a special kind of reified operation. It might be interesting to see what a generic facility for reified, hierarchical operations would look like.
Marcel
Ôøº
Hello Squeakers,
in an earlier message ( Musings about "ma" ) I mentioned a simple OO pipe-and-filter implementation. I have now finally started porting this stuff from Objective-C to Squeak. Do you think this is worthwhile pursuing further or just another crazy idea? :-)
Look, we're all nuts, so don't consider that the notions of "crazy idea" and "worthwhile pursuing" are mutually exclusive. :-)
Motivation and Rationale
The motivation for this package was the lack of reusability of printing and other encoding tasks. The problem is that each method taking part in an encoding task hard-codes the specific encoding task, for example a collection receiving a #printOn: message sends #printOn: to its elements.
The problem becomes apparent when trying to implement a #debugOn: message that only differs from #printOn: for a couple of objects. At first, this seems easy, defining #debugOn: as a rename of #printOn: in Object, and implementing #debugOn: for those objects that should behave differently, let's say Forms. However, this fails as soon as there are nested objects to debug, for example a collection of Forms. The collection receives the #debugOn: message, which is translated to #printOn: and this in turn sends #printOn: to all the Forms, not #debugOn: !
This was similar to the bug I found and fixed in the printOut mechanism that kept HtmlFileStreams from working properly.
On an abstract level, this could be solved by adding 'subclassing' of operations that works just like subclassing of objects with a similar treatment of self (in this case, the current operation).
I disagree -- I think subclassing doesn't work well at all in facilitating a reusable filter-and-pipeline model. It was precisely this concern that lead to the HtmlFileStream bug I mentioned earlier! A filter pipeline implemented by subclassing isn't terribly reusable because each element is "stuck" in its hierarchy -- it can't easily be reused "inserted" into another pipeline. The primary virtue of pipes and filters is the ability to develop small hunks of code that do "one thing well," changing inputs into ouputs (or generating outputs on demand from inputs), and which can be composed into bigger programs in the archetypal Unix model.
The real problem with subclassing implementations of pipe and filters is that the hierarchy elements must know about each other to avoid "restarting" the pipeline. Here's an example from HtmlFileStream.
The idea was to subclass FileStream, but to convert all text coming in into HTML on the fly, changing HTML command codes into corresponding HTML codes to generate literal text, and the like. A separate facility was created to permit "quoting" of input, using a method called #verbatim:, so HTML commands could be interspersed with the output. #verbatim was simply implemented, thusly:
verbatim: aString "Put out the string without HTML conversion."
super verbatim: aString
Relying upon a similar facility in the superClass. Unfortunately for HTMLFileStream, "verbatim:" there was implemented by the message "self nextPutAll:", which sent the message to HTMLFileStream>>nextPutAll: and not to the version in the superclass.
Once identified, the bug was trivially easy to fix, but it was a trick to identify, mostly because you couldn't debug HTMLFileStream without understanding its place in the hierarchy. This is not the way a filter should be developed. Filters should do one thing well, and know nothing about their environment except for the fact that it takes some input from an input and changes it to produce some output from an output.
Filters appear much better adapted by means of a Composition.
Thoughts on Marcel's solution.
I recognize that I am looking at a slightly different problem than Marcel's, but I fear his solution does not scale well for Smalltalk -- at least to reach the general classes of problems he is suggesting. In particular, since each new FileStream must create a message in Object, this tends to make things highly non-reusable, and invites substantial name conflicts and entanglements that would make it wholly unsuitable as a general purpose tool for combining Ritchie-esque tools.
Since FilterStream isn't a Stream, as that term is understood in Squeak, since it is not a subclass of Stream and doesn't follow the protocols of Stream, so it probably shouldn't be called "FilterStream."
In particular, I am also not thrilled with the idea of a new operation #write: used in the context of something called FilterStream. As I saw it, a FilterStream class would have instances that represent a filter pipeline -- sort of a super-Stream. Each node of the pipeline would implement a Ritchie-esque do-one-thing-well filter, knowing only that it would have two objects, inStream and outStream, which it could treat using traditional Stream protocols #next and #nextPut, and their progeny. The FilterStream would handle the mechanics of coroutine management and buffering throughout the pipeline, and would permit both push-based and pull-based pipelining.
Thus, I saw the notion of a filter pipeline as an independent instance of a collection of objects, building on the Stream protocols, rather than building the pipeline into the Object hierarchy itself. Marcel's solution seems, to me, to exacerbate rather than facilitate difficulties of reusing such filters.
I accept, however, that it is likely that I am just not "getting it," or that Marcel is solving an entirely different problem (apparently dealing with reflective properties of Smalltalk) that MUST be built into an object hierarchy to work. To my mind, however, it would be dangerous and confusing to call such a thing a "FilterStream."
Hello Andrew,
thanks for your in-depth response. I share some of your concerns, others I hope I can address.
1. Composability
I guess I wasn't clear enough on this. Composability is *the* fundamental aspect of FilterStreams because I see composability as the main benefit of a pipes-and-filters architecture. It is facilitated by the use of a single message, called #write: as both the input and output interface of a FilterStreams (more on naming later). This ensures that any FilterStream can act as source or sink for any other FilterStream, just like UNIX filters.
In fact, composition was the principal reuse mechanism for the HTML-Processing code I wrote with the Objective-C version. For example, HTML processing for a content management application looked roughly like this (translated from Objective-C).
filter := HtmlNonOKTagFilter stream: (HtmlAnchorNameExtractor stream: (HtmlReferenceExtractor stream: (HtmlImageAttrFilter stream: (HtmlTitleExtractor stream: (HtmlMetaExtractor stream: (HtmlBodyExtractor stream)))))).
The top would then be hooked up to an HTML-Parser and a nicely processed array/list of HTML-Elements and text could be picked up at the bottom. In fact, you could hook the HTML-Formatter straight to the end of that pipe and have complete text->HTML->processing->text in one neat pipeline. Each one of the filters does exactly one thing and passes the results down to the next stream. For example, the HtmlBodyExtractor ignores all input until it sees a <BODY> tag and then continues to pass stuff through without looking at it until it sees the </BODY> tag.
However, all the HTML processing elements were derived from an abstract HtmlProcessor class, otherwise a lot of code would have had to be duplicated. This was my problem with UNIX pipes and filters, that filters couldn't easily share common code and that it wasn't easily possible to subclass individual filters in order to generate a modified version.
Also, something like the #printForDebugger: problem doesn't really lend itself to a composed pipes solution, at least I don't see how to do this easily.
2. Naming in general
I am not very happy with the naming scheme I've found so far. However, I also haven't found anything better yet.
Currently, FilterStreams use completely different names for operations, just to keep everything completely distinct. If FilterStreams were to find acceptance, the names should definitely be changed to be better integrated with the rest of the system.
3. #write: vs. #nextPut: and #nextPutAll: in particular
FilterStreams use a *single* message to write/put objects into a stream/filter. This presents a minimal interface. The decision wether a collection should be passed-through untouched or have its elements written to the stream is made by the stream. AFAIK, passing collections 'as-is' doesn't make sense for Byte-Encoders, so ByteStream is a subclass of FlattenStream and FlattenStreams flattens all collections.
Having to use #nextPut: or #nextPutAll: depending on the type of the argument was something that always bothered me. It makes sense for the lowest-level OO stream interface, but for all the printing-level uses it's quite unecessary and much to concrete for my tastes.
I chose not to reuse the #nextPut: message name because its meaning is to place the argument in the collection of the receiving stream at the current location without processing. OTOH, #write: (almost) always does processing and has no notion of location or a collection associated with it. Using the same selector would probably lead to great confusion.
However, I am *always* glad for better names!
4. Push vs. Pull / full streams
FilterStreams certainly fall short of a real Streams implementation with full push/pull semantics, and there are applications where this is very, very helpful. It is very likely something that needs to be added in the future. However, the current implementation is much simpler and does quite a bit with its simple push-only model. I'd wait to see how far this simpler implementation can go and add a more complete model later.
Marcel
Oops, I forgot to pick up on another of your concerns.
5. Mandatory addition of methods to Object
I agree: this is extremely pokey right now!
It could also be made 100% automatic and somewhat more efficient by having the FilterStream subclass copy the method body when initialized. Still not all that wonderful, but at least hidden from view...
Marcel
squeak-dev@lists.squeakfoundation.org