The attached changeset speeds up WAEncoder by noticing that all of its
subclasses are mostly outputting the input value for all characters, and
that a conversion from a character X to a character Y is never found.
With this change I got a 15% improvement on this test:
(s := ((1 to: 100000) collect: [ :e | Random between: 32 and: 126 ])
asByteArray asString) size
Time millisecondsToRun: [ 100 timesRepeat: [ ws := String new
writeStream. (SWAHtmlEncoder on: ws) nextPutAll: s ] ]
before: 5319 (best of 4 runs, restarting the VM each time)
after: 4606 (best of 4 runs, restarting the VM each time)
Consistent with this, I measured the cost per character of
WAEncoder>>nextPutAll: and the (new) WASimpleEncoder>>nextPutAll: as
respectively 24 and 21 bytecodes. This was measured with GNU
Smalltalk's profiler.
The reason is that on GST #notNil is much faster than #isString.
I could get even better speedups by using SequenceableCollection's
#at:ifAbsent: method. GST has it implemented as a primitive on
SequenceableCollection (with the failure code invoking the absentBlock),
which explains the reason for the speed. But that would be a speedup
only when all characters are in the 0-255 range, so I did not do that.
In general, the undisputed hotspot is WriteStream>>#nextPut: (15%),
which is heavily used under GST by both Swazoo and Seaside. I am
starting to think it was not such a bad idea to make it a primitive in
the Blue Book... :-)
While unportable, even a C-coded String>>#htmlEncoded (to be used by
String>>#encodeOn:) would not be a bad idea actually. 10% of execution
time is spent there, and while this would have a higher GC cost because
of possibly big strings returned by the C function, #nextPutAll: boils
down to a single memcpy so...
Paolo