[squeak-dev] The Trunk: Multilingual-tpr.185.mcz

Wed Oct 9 23:25:25 UTC 2013

I don't know if this micro-benchmark is relevant, since the charsetAt:
should be inquired only at a leadingChar change. (the send should be put
out of the scanJapaneseCharactersFrom: loop).
I should also run more than once, but here it is

| tmp |
tmp := {Unicode. nil}.
{
[tmp at: 1] bench.
[(tmp at: 1) ifNil: [Unicode]] bench.
[(tmp at: 2) ifNil: [Unicode]] bench.
[tmp at: 1 ifAbsent: [Unicode]] bench.
[tmp at: 0 ifAbsent: [Unicode]] bench.
[(tmp at: 0 ifAbsent: [nil]) ifNil: [Unicode]] bench.
[(tmp at: 0 ifAbsent: nil) ifNil: [Unicode]] bench.
}
 #(
'22,900,000 per second.'
'22,700,000 per second.'
'18,500,000 per second.'
'5,570,000 per second.'
'5,200,000 per second.'
'5,160,000 per second.'
'14,600,000 per second.'
)

The major cost of at:ifAbsent: currently seem to be the Closure...
Cheating with this property: nil value -> nil makes a difference.

Shall we make provisions for leadingChar > 256 in next 64bits Spur image,
or will immediate characters be restricted to 32bits?
Note that leadingChar could already reach 1023 (10 bits), because there is
no reason to restrict a WordArray content (32 bits) to small positive
integers (30 bits), except a convention for not slowing down things too
much with LargeIntegers...
The ifAbsent: is protecting us from such crafted MalCharacter.

2013/10/10 Levente Uzonyi <leves at elte.hu>

> On Wed, 9 Oct 2013, Bert Freudenberg wrote:
>
>
>> On 09.10.2013, at 00:52, Levente Uzonyi <leves at elte.hu> wrote:
>>
>>  On Tue, 8 Oct 2013, Nicolas Cellier wrote:
>>>
>>>  I would prefer decent default being ^Unicode, if ever (EncodedCharSets
>>>> at:1) isNil for some (bad) reason.
>>>>
>>>
>>> Wouldn't it be better to fill the EncodedCharSets array with Unicode by
>>> default in EncodedCharSet class >> #initialize? (replace the line
>>>
>>>         EncodedCharSets := Array new: 256.
>>>
>>> with:
>>>
>>>         EncodedCharSets := Array new: 256 withAll: Unicode
>>> )
>>>
>>> That way #charsetAt: could be simply
>>>
>>>         ^EncodedCharSets at: encoding + 1
>>>
>>>
>>> Levente
>>>
>>
>>
>> IMHO that would obscure the intention. It is technically equivalent, yes,
>> but I'd like to see the explicit default. Most readable might be this:
>>
>
> I think it's better, because the intention is expressed in a single
> method, instead of two. The explicit default is there, but in #initialize.
>
>
>
>>         ^ (EncodedCharSets at: encoding + 1) ifNil: [Unicode]
>>
>> We could even skip the "+ 1" part and only store the encoded charsets in
>> EncodedCharSets. Unicode is not encoded, which is well-expressed by the
>> code 0.
>>
>>         ^ (EncodedCharSets at: encoding ifAbsent: [nil]) ifNil: [Unicode]
>>
>
> Performance wise it's better to keep the "+ 1", and even better to save
> the #ifNil: too. :)
>
>
> Levente
>
>
>
>>
>> - Bert -
>>
>>         charsetAt: encoding
>>>>      + "Find  the char set encoding that matches 'encoding'; return a
>>>> decent default rather than nil"
>>>>      +       ^ (EncodedCharSets at: encoding + 1) ifNil:
>>>> [EncodedCharSets at: 1].
>>>>      -
>>>>      -       ^ EncodedCharSets at: encoding + 1 ifAbsent:
>>>> [EncodedCharSets at: 1].
>>>>        !
>>>>
>>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20131010/5bd538e5/attachment.htm