<div dir="ltr"><div><div>I don't know if this micro-benchmark is relevant, since the charsetAt: should be inquired only at a leadingChar change. (the send should be put out of the scanJapaneseCharactersFrom: loop).<br>
</div>I should also run more than once, but here it is<br><br>| tmp |<br>tmp := {Unicode. nil}.<br>{<br>[tmp at: 1] bench.<br>[(tmp at: 1) ifNil: [Unicode]] bench.<br>[(tmp at: 2) ifNil: [Unicode]] bench.<br>[tmp at: 1 ifAbsent: [Unicode]] bench.<br>
[tmp at: 0 ifAbsent: [Unicode]] bench.<br>[(tmp at: 0 ifAbsent: [nil]) ifNil: [Unicode]] bench.<br>[(tmp at: 0 ifAbsent: nil) ifNil: [Unicode]] bench.<br>}<br> #(<br>'22,900,000 per second.'<br>'22,700,000 per second.'<br>
'18,500,000 per second.'<br>'5,570,000 per second.'<br>'5,200,000 per second.'<br>'5,160,000 per second.'<br>'14,600,000 per second.'<br>)<br><br></div>The major cost of at:ifAbsent: currently seem to be the Closure...<br>
Cheating with this property: nil value -> nil makes a difference.<br><div><br></div><div>Shall we make provisions for leadingChar > 256 in next 64bits Spur image, or will immediate characters be restricted to 32bits?<br>
</div><div>Note that leadingChar could already reach 1023 (10 bits), because there is no reason to restrict a WordArray content (32 bits) to small positive integers (30 bits), except a convention for not slowing down things too much with LargeIntegers...<br>
</div><div>The ifAbsent: is protecting us from such crafted MalCharacter.<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/10/10 Levente Uzonyi <span dir="ltr"><<a href="mailto:leves@elte.hu" target="_blank">leves@elte.hu</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Wed, 9 Oct 2013, Bert Freudenberg wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On 09.10.2013, at 00:52, Levente Uzonyi <<a href="mailto:leves@elte.hu" target="_blank">leves@elte.hu</a>> wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Tue, 8 Oct 2013, Nicolas Cellier wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I would prefer decent default being ^Unicode, if ever (EncodedCharSets at:1) isNil for some (bad) reason.<br>
</blockquote>
<br>
Wouldn't it be better to fill the EncodedCharSets array with Unicode by default in EncodedCharSet class >> #initialize? (replace the line<br>
<br>
EncodedCharSets := Array new: 256.<br>
<br>
with:<br>
<br>
EncodedCharSets := Array new: 256 withAll: Unicode<br>
)<br>
<br>
That way #charsetAt: could be simply<br>
<br>
^EncodedCharSets at: encoding + 1<br>
<br>
<br>
Levente<br>
</blockquote>
<br>
<br>
IMHO that would obscure the intention. It is technically equivalent, yes, but I'd like to see the explicit default. Most readable might be this:<br>
</blockquote>
<br></div>
I think it's better, because the intention is expressed in a single method, instead of two. The explicit default is there, but in #initialize.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
^ (EncodedCharSets at: encoding + 1) ifNil: [Unicode]<br>
<br>
We could even skip the "+ 1" part and only store the encoded charsets in EncodedCharSets. Unicode is not encoded, which is well-expressed by the code 0.<br>
<br>
^ (EncodedCharSets at: encoding ifAbsent: [nil]) ifNil: [Unicode]<br>
</blockquote>
<br></div>
Performance wise it's better to keep the "+ 1", and even better to save the #ifNil: too. :)<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
Levente</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<br>
- Bert -<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
charsetAt: encoding<br>
+ "Find the char set encoding that matches 'encoding'; return a decent default rather than nil"<br>
+ ^ (EncodedCharSets at: encoding + 1) ifNil: [EncodedCharSets at: 1].<br>
-<br>
- ^ EncodedCharSets at: encoding + 1 ifAbsent: [EncodedCharSets at: 1].<br>
!<br>
</blockquote></blockquote>
<br>
<br>
<br>
<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div>