Base64 vs. decimal (was: CompileMethod limits)
Bert Freudenberg
bert at freudenbergs.de
Sun Feb 25 13:09:44 UTC 2007
On Feb 24, 2007, at 23:54 , Milan Zimmermann wrote:
> On 2007 February 24 06:08, Bert Freudenberg wrote:
>>
>> I was suggesting using #storeString on the Form, not the ByteArray.
>> This is more efficient because it uses words instead of bytes, and it
>> uses only one literal.
>
> I misunderstood ... by Form, do you mean the project contents read
> from file,
> I am not clear on that.
I think I misremembered ... you are trying to embed a project, right?
I thought it was a picture (class Form).
>> This way, the parsing work is done only once
>> when compiling. Reading decimal from a String at runtime is
>> particularly inefficient.
>> Don't do that ;)
>
> I live to learn :) - and thanks for help.
>
> Trying to be quick I misrepresented my wording about Decimals, the
> deserialization does:
>
> serializedByteArrayContentsAsString do: [ :b |
> (b = Character space)
> ifTrue: [coll addLast: token asInteger. token := ''.]
> ifFalse: [token := token,b asString.].]. "process last snipped"
> coll asByteArray.
> It is a hack and not very fast, but seems ok in the context of this
> being done
> once for all tests. Would this be considered slow, I am not sure
> what to
> compare it to. Maybe it is even slower than reading decimal from a
> String. I
> should really replace it.
Base64 has a space overhead of about 33%, it encodes 3 bytes in 4
characters. Printing each byte as decimal number plus a space adds
about 250% on average, each byte takes 2 to 4 characters to encode.
So space-wise base64 should be vastly more efficient. We can test that:
a := ((1 to: 100000) collect: [:i | i \\ 256]) asByteArray.
"encoding"
s1 := String streamContents: [:s | a do: [:each | s print: each;
space]].
s2 := (Base64MimeConverter mimeEncode: a readStream) contents.
"decoding"
t0 := [coll := OrderedCollection new. token := ''.
s1 do: [ :b | (b = Character space)
ifTrue: [coll addLast: token asInteger. token := ''.]
ifFalse: [token := token,b asString.].].
a0:= coll asByteArray] timeToRun.
t1 := [r1 := s1 readStream.
a1 := ByteArray streamContents: [:s |
[r1 atEnd] whileFalse: [s nextPut: (Integer readFrom: r1). r1 skip:
1]]] timeToRun.
t2 := [a2 := (Base64MimeConverter mimeDecodeToBytes: s2 readStream)
contents] timeToRun.
{'Original'. a0=a. s1 size. t0. 'Decimal'. a1=a. s1 size. t1.
'Base64'. a2=a. s2 size. t2}
#('Original' true 356992 2334 'Decimal' true 356992 1064 'Base64'
true 135187 103)
So your code (t0) can be optimized a bit (t1), making it half as
slow. But doing base64 (t2) is an additional 10 times faster.
So ... 10 times faster ... using a third of the space ... base64 wins
hands down, IMHO ;)
- Bert -
More information about the Squeak-dev
mailing list
|