Pipe syntax and the current methods
Michael Lucas-Smith
mlucas-smith at cincom.com
Mon Aug 27 20:38:35 UTC 2007
> With pipes, this could be written as
>
> highestNumberedChangeSet
> "ChangeSorter highestNumberedChangeSet"
> ^self allChangeSetNames
> select:[:aString | aString startsWithDigit] ;;
> collect:[:aString | aString initialIntegerOrNil] ;;
> ifNotEmpty:[:list | list max]
>
With pipe objects using standard smalltalk syntax, this could be written as:
highestNumberedChangeSet
"ChangeSorter highestNumberedChangeSet"
^self allChangeSetNames asPipe
selecting: [:aString | aString startsWithDigit];
collecting: [:aString | aString initialIntegerOrNil];
ifNotEmpty: [:list | list max]
> which, with pipes, could be rewritten as...
>
> self systemNavigation
> allCallsOn: assoc ;;
> collect: [:each | each classSymbol] ;;
> asSet ;;
> do: [:clsName | (Smalltalk at: clsName) replaceSilently: oldName
> to: aName].
>
And again:
(self systemNavigation allCallsOn: assoc) asPipe
collecting: [:each | each classSymbol];
selectingAsUniqueSet;
do: [:clsName | (Smalltalk at: clsName) replaceSilently: oldName to:
aName]
I guess I should point out that work like this has been done in
VisualWorks from two fronts - StreamWrappers and ComputingStreams. Both
packages are available in public store.
Pipes are a great idea - streams talking to streams is the only way to
do efficient large-data-set programming (eg: google's map+reduce technique).
I wish more of Smalltalk were written with this approach in mind, it'd
scale without effort then.. and programmers wouldn't accidently create
memory explosion bottlenecks without trying. Multiple select:, collect:,
reject: calls on large data sets will bring any image to its knees in
seconds if more than one concurrent user invokes the same sort of
operation at once.
The speed issue comes not from the time it takes to do one call of this
- but what happens when multiple processes try to do the same thing (eg:
multiple users hitting your server at once). And the speed issue comes
in not from computing CPU cycles, but from memory allocation and garbage
collection.
If you start with a collection of 100,000 things, you do 4 operations on
it - three collect:'s and a reject:.. you'll probably be allocating 4
arrays of 100,000 slots. That's 1.2mb's of data you've just allocated.
Now get 10 users using the same function at the same time and you've
just made 12mb's of data. Scale it up a little more elaborate chains of
functions or more users and you have serious scalability issues.
Now to put the point home - if you are generating web pages from the
server. You start with a parse of a node tree which concatenates dozens
of little strings together to produce the page - which pushes it through
a zip library - which pushes it through a chunks stream - perhaps
there's a utf8 encoder in there too. Unless all those streams are using
a cyclic buffer, streams to streams, they're going to be generating LOTS
of small and big strings as they build up their own internal streams and
buffers.
Anyway.. just food for thought. At Wizard we spent a considerably amount
of time optimizing our Http/1.1 server to deal with exactly this sort of
thing. We also found we could use the same code for database operations
too (we were using BerkeleyDB as our database, so querying was done by us).
Cheers,
Michael
More information about the Squeak-dev
mailing list
|