[Seaside-dev] speedup WAEncoder

Paolo Bonzini bonzini at gnu.org
Sat Oct 3 10:22:55 UTC 2009


The attached changeset speeds up WAEncoder by noticing that all of its 
subclasses are mostly outputting the input value for all characters, and 
that a conversion from a character X to a character Y is never found.

With this change I got a 15% improvement on this test:

(s := ((1 to: 100000) collect: [ :e | Random between: 32 and: 126 ]) 
asByteArray asString) size
Time millisecondsToRun: [ 100 timesRepeat: [ ws := String new 
writeStream. (SWAHtmlEncoder on: ws) nextPutAll: s ] ]

before: 5319 (best of 4 runs, restarting the VM each time)
after: 4606 (best of 4 runs, restarting the VM each time)

Consistent with this, I measured the cost per character of 
WAEncoder>>nextPutAll: and the (new) WASimpleEncoder>>nextPutAll: as 
respectively 24 and 21 bytecodes.  This was measured with GNU 
Smalltalk's profiler.

The reason is that on GST #notNil is much faster than #isString.

I could get even better speedups by using SequenceableCollection's 
#at:ifAbsent: method.  GST has it implemented as a primitive on 
SequenceableCollection (with the failure code invoking the absentBlock), 
which explains the reason for the speed.  But that would be a speedup 
only when all characters are in the 0-255 range, so I did not do that.

In general, the undisputed hotspot is WriteStream>>#nextPut: (15%), 
which is heavily used under GST by both Swazoo and Seaside.  I am 
starting to think it was not such a bad idea to make it a primitive in 
the Blue Book... :-)

While unportable, even a C-coded String>>#htmlEncoded (to be used by 
String>>#encodeOn:) would not be a bad idea actually.  10% of execution 
time is spent there, and while this would have a higher GC cost because 
of possibly big strings returned by the C function, #nextPutAll: boils 
down to a single memcpy so...

Paolo
-------------- next part --------------
SystemOrganization addCategory: #'Seaside-Core-Document'!
WAEncoder subclass: #WASimpleEncoder
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Seaside-Core-Document'!

!WASimpleEncoder commentStamp: '<historical>' prior: 0!
I encode everything that is written to myself using #nextPut: and #nextPutAll: onto the wrapped stream. I am optimized for the case of being neutral to most characters. The convertion of a single character is defined in #encode:on: on the class-side of my subclasses.!

!WASimpleEncoder class methodsFor: 'initialization'!

initializeBMP
	"Initializes the BMP, the Basic Multilingual Plane of UTF characters, using the encoding strategy of the receiver. This caching strategy ensures that most commonly used characters can be encoded as efficient as possible."
	| stream characterLimit ch |
	characterLimit := self maximumCharacterValue.
	table := Array new: characterLimit.
	stream := WriteStream on: (String new: 6).
	1 to: characterLimit
		do: [:index | 
			ch := Character value: index - 1.
			self encode: ch on: stream reset.
			table at: index
				put: (stream contents = (String with: ch)
					ifTrue: [nil]
					ifFalse: [stream contents])]
! !

!WASimpleEncoder methodsFor: 'accessing'!

nextPut: aCharacter
	| value encoded |
	value := aCharacter asInteger.
	value < table size 
		ifFalse: [stream nextPut: aCharacter]
		ifTrue: 
			[encoded := table at: value + 1.
			encoded notNil
				ifTrue: [stream nextPutAll: encoded]
				ifFalse: [stream nextPut: aCharacter]]
!

nextPutAll: aString
	"uses #to:do: for speed reasons (on Squeak)
	 this is not premature optimization, this is a hotspot method method
	 and #to:do: shows measurable speed improvements for rendering seaside pages"

	| character value encoded |
	1 to: aString size
		do: [:index | 
			character := aString at: index.
			value := character asInteger.
			value < table size 
				ifFalse: [self class encode: character on: stream]
				ifTrue: 
					[encoded := table at: value + 1.
					encoded notNil 
						ifTrue: [stream nextPutAll: encoded]
						ifFalse: [stream nextPut: character]]]
! !



WASimpleEncoder subclass: #WAHtmlEncoder
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Seaside-Core-Document'!

!WAHtmlEncoder class methodsFor: 'initialization'!

maximumCharacterValue
	^255
! !



WASimpleEncoder subclass: #WAUrlEncoder
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Seaside-Core-Document'!

!WAUrlEncoder class methodsFor: 'initialization'!

maximumCharacterValue
	^255
! !


WAHtmlEncoder initialize!
WAUrlEncoder initialize!


More information about the seaside-dev mailing list