FilterStreams prototype

Marcel Weiher marcel at system.de
Sat Apr 3 18:03:43 UTC 1999


Hello Andrew,

thanks for your in-depth response.  I share some of your concerns,
others I hope I can address.

1.	Composability

I guess I wasn't clear enough on this.  Composability is *the*
fundamental aspect of FilterStreams because I see composability as
the main benefit of a pipes-and-filters architecture.  It is
facilitated by the use of a single message, called #write: as both
the input and output interface of a FilterStreams (more on naming
later).  This ensures that any FilterStream can act as source or sink  
for any other FilterStream, just like UNIX filters.

In fact, composition was the principal reuse mechanism for the
HTML-Processing code I wrote with the Objective-C version.  For
example, HTML processing for a content management application looked  
roughly like this (translated from Objective-C).

	filter := HtmlNonOKTagFilter stream:
			(HtmlAnchorNameExtractor stream:
			(HtmlReferenceExtractor stream:
			(HtmlImageAttrFilter stream:
			(HtmlTitleExtractor stream:
			(HtmlMetaExtractor stream:
			(HtmlBodyExtractor stream)))))).


The top would then be hooked up to an HTML-Parser and a nicely
processed array/list of HTML-Elements and text could be picked up at  
the bottom.  In fact, you could hook the HTML-Formatter straight to  
the end of that pipe and have complete text->HTML->processing->text  
in one neat pipeline.  Each one of the filters does exactly one thing  
and passes the results down to the next stream.  For example, the
HtmlBodyExtractor ignores all input until it sees a <BODY> tag and
then continues to pass stuff through without looking at it until it  
sees the </BODY> tag.

However, all the HTML processing elements were derived from an
abstract HtmlProcessor class, otherwise a lot of code would have had  
to be duplicated.  This was my problem with UNIX pipes and filters,  
that filters couldn't easily share common code and that it wasn't
easily possible to subclass individual filters in order to generate a  
modified version.

Also, something like the #printForDebugger: problem doesn't really
lend itself to a composed pipes solution, at least I don't see how to  
do this easily.

2.	Naming in general

I am not very happy with the naming scheme I've found so far.
However, I also haven't found anything better yet.

Currently, FilterStreams use completely different names for
operations, just to keep everything completely distinct.  If
FilterStreams were to find acceptance, the names should definitely be  
changed to be better integrated with the rest of the system.

3.	#write: vs. #nextPut: and #nextPutAll: in particular

FilterStreams use a *single* message to write/put objects into a
stream/filter.  This presents a minimal interface.  The decision
wether a collection should be passed-through untouched or have its
elements written to the stream is made by the stream.  AFAIK, passing  
collections 'as-is' doesn't make sense for Byte-Encoders, so
ByteStream is a subclass of FlattenStream and FlattenStreams flattens  
all collections.

Having to use #nextPut: or #nextPutAll: depending on the type of the  
argument was something that always bothered me.  It makes sense for  
the lowest-level OO stream interface, but for all the printing-level  
uses it's quite unecessary and much to concrete for my tastes.

I chose not to reuse the #nextPut: message name because its meaning  
is to place the argument in the collection of the receiving stream at  
the current location without processing.  OTOH, #write: (almost)
always does processing and has no notion of location or a collection  
associated with it.  Using the same selector would probably lead to  
great confusion.

However, I am *always* glad for better names!

4.	Push vs. Pull / full streams

FilterStreams certainly fall short of a real Streams implementation  
with full push/pull semantics, and there are applications where this  
is very, very helpful.  It is very likely something that needs to be  
added in the future.  However, the current implementation is much
simpler and does quite a bit with its simple push-only model.  I'd
wait to see how far this simpler implementation can go and add a more  
complete model later.


Marcel





More information about the Squeak-dev mailing list