Base64 vs. decimal (was: CompileMethod limits)

Bert Freudenberg bert at freudenbergs.de
Sun Feb 25 13:09:44 UTC 2007


On Feb 24, 2007, at 23:54 , Milan Zimmermann wrote:

> On 2007 February 24 06:08, Bert Freudenberg wrote:
>>
>> I was suggesting using #storeString on the Form, not the ByteArray.
>> This is more efficient because it uses words instead of bytes, and it
>> uses only one literal.
>
> I misunderstood ... by Form, do you mean the project contents read  
> from file,
> I am not clear on that.

I think I misremembered ... you are trying to embed a project, right?  
I thought it was a picture (class Form).

>> This way, the parsing work is done only once
>> when compiling. Reading decimal from a String at runtime is
>> particularly inefficient.
>> Don't do that ;)
>
> I live to learn :) - and thanks for help.
>
> Trying to be quick I misrepresented my wording about Decimals, the
> deserialization does:
>
>   serializedByteArrayContentsAsString do: [ :b |
>   (b = Character space)
>      ifTrue: [coll addLast: token asInteger. token := ''.]
>      ifFalse: [token := token,b asString.].]. "process last snipped"
>   coll asByteArray.
> It is a hack and not very fast, but seems ok in the context of this  
> being done
> once for all tests. Would this be considered slow, I am not sure  
> what to
> compare it to. Maybe it is even slower than reading decimal from a  
> String. I
> should really replace it.

Base64 has a space overhead of about 33%, it encodes 3 bytes in 4  
characters. Printing each byte as decimal number plus a space adds  
about 250% on average, each byte takes 2 to 4 characters to encode.  
So space-wise base64 should be vastly more efficient. We can test that:

a := ((1 to: 100000) collect: [:i | i \\ 256]) asByteArray.
"encoding"
s1 := String streamContents: [:s | a do: [:each | s print: each;  
space]].
s2 := (Base64MimeConverter mimeEncode: a readStream) contents.
"decoding"
t0 := [coll := OrderedCollection new. token := ''.
	s1 do: [ :b | (b = Character space)
		ifTrue: [coll addLast: token asInteger. token := ''.]
		ifFalse: [token := token,b asString.].].
	a0:= coll asByteArray] timeToRun.
t1 := [r1 := s1 readStream.
	a1 := ByteArray streamContents: [:s |
		[r1 atEnd] whileFalse: [s nextPut: (Integer readFrom: r1). r1 skip:  
1]]] timeToRun.
t2 := [a2 := (Base64MimeConverter mimeDecodeToBytes: s2 readStream)  
contents] timeToRun.
{'Original'. a0=a. s1 size. t0. 'Decimal'. a1=a. s1 size. t1.  
'Base64'. a2=a. s2 size. t2}

#('Original' true 356992 2334 'Decimal' true 356992 1064 'Base64'  
true 135187 103)

So your code (t0) can be optimized a bit (t1), making it half as  
slow. But doing base64 (t2) is an additional 10 times faster.

So ... 10 times faster ... using a third of the space ... base64 wins  
hands down, IMHO ;)

- Bert -





More information about the Squeak-dev mailing list