[squeak-dev] The Trunk: Collections-eem.907.mcz

Levente Uzonyi leves at caesar.elte.hu
Sun Aug 9 17:32:57 UTC 2020


Hi Eliot,

On Sat, 8 Aug 2020, Eliot Miranda wrote:

> Hi Levente,
>
>
>> On Aug 7, 2020, at 5:07 PM, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>> 
>> Hi Eliot,
>> 
>> If performance matters, I've got two alternative implementations.
>
> Performance isn’t that big a deal.  But something told me that the indexOf: approach sux :-)
>
>> Both are about 1.2x faster than yours and are based on lookup tables.
>> The first one uses #digitValue which uses Character's "built-in" DigitValues table:
>> 
>> readHexFrom: aStream
>>    "Initialize the receiver from a hexadecimal string representation"
>>
>>    1 to: self size do: [ :i |
>>        | c v1 v2 |
>>        ((c := aStream next) asInteger > 102 "$f asInteger"
>>            or: [ (v1 := c digitValue) < 0
>>            or: [ v1 > 15
>>            or: [ (c := aStream next) asInteger > 102 "$f asInteger"
>>            or: [ (v2 := c digitValue) < 0>>            or: [ v2 > 15 ] ] ] ] ])
>>            ifTrue: [ ^self error: 'Hex digit expected' ].
>>        self at: i put: (v1 bitShift: 4) + v2 ]
>
> When I tested this it was slower.  I tested it on ByteArrays >= 256 bytes in length that were less than 50% zeros.  So let’s not go this way.

That seemed really surprising first, but the "less than 50% zeroes" gave 
it away. Your code takes different execution paths based on the input, 
and that has a surpisingly significant effect on performance.
Here's the benchmark I used which generates uniformly distributed values 
with mixed lower and upper case characters. It shows the performance 
difference I wrote:

| b r letters s stream |
b := ByteArray new: 1000.
r := Random seed: 36rSqueak.
letters := ($0 to: $9), ($0 to: $9), ($a to: $f), ($A to: $F).
s := (1 to: b size * 2) collect: [ :e | letters atRandom: r ] as: String.
stream := s readStream.
{
 	[ stream reset. b readHexFrom: stream ] benchFor: 1 seconds.
 	[ stream reset. b readHexFromVariant1: stream ] benchFor: 1 seconds.
 	[ stream reset. b readHexFromVariant2: stream ] benchFor: 1 seconds }

#(
 	'16,700 per second. 59.7 microseconds per run. 0 % GC time.'
 	'20,200 per second. 49.4 microseconds per run. 0 % GC time.'
 	'20,100 per second. 49.8 microseconds per run. 0 % GC time.'
)

However, if the input is restricted to 0-9 characters by using

letters := $0 to: $9.

in the above benchmark, the results become

#(
 	'22,700 per second. 44.2 microseconds per run. 0 % GC time.'
 	'19,400 per second. 51.5 microseconds per run. 0 % GC time.'
 	'19,500 per second. 51.3 microseconds per run. 0 % GC time.'
)

with

letters := ($0 to: $9), ($a to: $f).

the results are

#(
 	'18,800 per second. 53.1 microseconds per run. 0 % GC time.'
 	'21,200 per second. 47.2 microseconds per run. 0 % GC time.'
 	'20,200 per second. 49.4 microseconds per run. 0 % GC time.'
)

and with

letters := $a to: $f.

#(
 	'22,600 per second. 44.3 microseconds per run. 0 % GC time.'
 	'20,500 per second. 48.9 microseconds per run. 0 % GC time.'
 	'20,000 per second. 50 microseconds per run. 0 % GC time.'
)

So, the more characters are from a single range (0-9 or a-f or A-F), as it 
was probably the case with the data you benchmarked with, the faster your 
variant becomes.


Levente

>
>> 
>> The second one uses a non-minimal perfect hash table with an extremely simple hash function:
>> 
>> readHexFrom: aStream
>>    "Initialize the receiver from a hexadecimal string representation"
>>
>>    | keys values |
>>    keys :=  #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0].
>>    values :=  #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0].
>>    1 to: self size do: [ :i |
>>        | c index v1 v2 |
>>        c := aStream next asInteger.
>>        index := (c bitAnd: 63) + 1.
>>        (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>>        v1 := values at: index.
>>        c := aStream next asInteger.
>>        index := (c bitAnd: 63) + 1.
>>        (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>>        v2 := values at: index.
>>        self at: i put: (v1 bitShift: 4) + v2 ]
>
> That’s neat, I’ll remember the technique.  But what we have is more straight forward.
>
>> Levente
>> 
>>> On Thu, 6 Aug 2020, commits at source.squeak.org wrote:
>>> 
>>> Eliot Miranda uploaded a new version of Collections to project The Trunk:
>>> http://source.squeak.org/trunk/Collections-eem.907.mcz
>>> 
>>> ==================== Summary ====================
>>> 
>>> Name: Collections-eem.907
>>> Author: eem
>>> Time: 6 August 2020, 12:51:46.481296 pm
>>> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70
>>> Ancestors: Collections-mt.906
>>> 
>>> 2.7x faster ByteArray>>readHexFrom:
>>> 
>>> =============== Diff against Collections-mt.906 ===============
>>> 
>>> Item was changed:
>>> ----- Method: ByteArray>>readHexFrom: (in category 'initialize') -----
>>> readHexFrom: aStream
>>>    "Initialize the receiver from a hexadecimal string representation"
>>> + +    1 to: self size do:
>>> +        [:i| | n v1 v2 |
>>> +         n := aStream next asInteger.
>>> +         v1 := n > 57                        "$9 asInteger = 57"
>>> +                ifTrue:
>>> +                    [n > 96                    "$a asInteger 97"
>>> +                        ifTrue: [n - 87]
>>> +                        ifFalse: [n > 64        "$A asInteger = 65"
>>> +                                ifTrue: [n - 55]
>>> +                                ifFalse: [-1]]] +                ifFalse: [n - 48].                "$0 asInteger = 48"
>>> +         (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>>> +         n := aStream next asInteger.
>>> +         v2 := n > 57                        "$9 asInteger = 57"
>>> +                ifTrue:
>>> +                    [n > 96                    "$a asInteger 97"
>>> +                        ifTrue: [n - 87]
>>> +                        ifFalse: [n > 64        "$A asInteger = 65"
>>> +                                ifTrue: [n - 55]
>>> +                                ifFalse: [-1]]]
>>> +                ifFalse: [n - 48].                "$0 asInteger = 48"
>>> +        (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>>> +        self at: i put: (v1 bitShift: 4) + v2]
>>> + +    "Proof that our filter selects only hexadecimal characters:
>>> +    (0 to: 255)
>>> +        select:
>>> +            [:n|
>>> +            (n > 57
>>> +                ifTrue:
>>> +                    [n > 96 +                        ifTrue: [n - 87]
>>> +                        ifFalse: [n > 64
>>> +                                ifTrue: [n - 55]
>>> +                                ifFalse: [-1]]]
>>> +                ifFalse: [n - 48]) between: 0 and: 15]
>>> +        thenCollect:
>>> +            [:n| Character value: n]"!
>>> -    | map v ch value |
>>> -    map := '0123456789abcdefABCDEF'.
>>> -    1 to: self size do:[:i|
>>> -        ch := aStream next.
>>> -        v := (map indexOf: ch) - 1.
>>> -        ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>>> -        value := v bitShift: 4.
>>> -        ch := aStream next.
>>> -        v := (map indexOf: ch) - 1.
>>> -        ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>>> -        value := value + v.
>>> -        self at: i put: value.
>>> -    ].
>>> - !
>>


More information about the Squeak-dev mailing list