[squeak-dev] The Trunk: Collections-eem.907.mcz
Levente Uzonyi
leves at caesar.elte.hu
Sun Aug 9 17:32:57 UTC 2020
Hi Eliot,
On Sat, 8 Aug 2020, Eliot Miranda wrote:
> Hi Levente,
>
>
>> On Aug 7, 2020, at 5:07 PM, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>>
>> Hi Eliot,
>>
>> If performance matters, I've got two alternative implementations.
>
> Performance isn’t that big a deal. But something told me that the indexOf: approach sux :-)
>
>> Both are about 1.2x faster than yours and are based on lookup tables.
>> The first one uses #digitValue which uses Character's "built-in" DigitValues table:
>>
>> readHexFrom: aStream
>> "Initialize the receiver from a hexadecimal string representation"
>>
>> 1 to: self size do: [ :i |
>> | c v1 v2 |
>> ((c := aStream next) asInteger > 102 "$f asInteger"
>> or: [ (v1 := c digitValue) < 0
>> or: [ v1 > 15
>> or: [ (c := aStream next) asInteger > 102 "$f asInteger"
>> or: [ (v2 := c digitValue) < 0>> or: [ v2 > 15 ] ] ] ] ])
>> ifTrue: [ ^self error: 'Hex digit expected' ].
>> self at: i put: (v1 bitShift: 4) + v2 ]
>
> When I tested this it was slower. I tested it on ByteArrays >= 256 bytes in length that were less than 50% zeros. So let’s not go this way.
That seemed really surprising first, but the "less than 50% zeroes" gave
it away. Your code takes different execution paths based on the input,
and that has a surpisingly significant effect on performance.
Here's the benchmark I used which generates uniformly distributed values
with mixed lower and upper case characters. It shows the performance
difference I wrote:
| b r letters s stream |
b := ByteArray new: 1000.
r := Random seed: 36rSqueak.
letters := ($0 to: $9), ($0 to: $9), ($a to: $f), ($A to: $F).
s := (1 to: b size * 2) collect: [ :e | letters atRandom: r ] as: String.
stream := s readStream.
{
[ stream reset. b readHexFrom: stream ] benchFor: 1 seconds.
[ stream reset. b readHexFromVariant1: stream ] benchFor: 1 seconds.
[ stream reset. b readHexFromVariant2: stream ] benchFor: 1 seconds }
#(
'16,700 per second. 59.7 microseconds per run. 0 % GC time.'
'20,200 per second. 49.4 microseconds per run. 0 % GC time.'
'20,100 per second. 49.8 microseconds per run. 0 % GC time.'
)
However, if the input is restricted to 0-9 characters by using
letters := $0 to: $9.
in the above benchmark, the results become
#(
'22,700 per second. 44.2 microseconds per run. 0 % GC time.'
'19,400 per second. 51.5 microseconds per run. 0 % GC time.'
'19,500 per second. 51.3 microseconds per run. 0 % GC time.'
)
with
letters := ($0 to: $9), ($a to: $f).
the results are
#(
'18,800 per second. 53.1 microseconds per run. 0 % GC time.'
'21,200 per second. 47.2 microseconds per run. 0 % GC time.'
'20,200 per second. 49.4 microseconds per run. 0 % GC time.'
)
and with
letters := $a to: $f.
#(
'22,600 per second. 44.3 microseconds per run. 0 % GC time.'
'20,500 per second. 48.9 microseconds per run. 0 % GC time.'
'20,000 per second. 50 microseconds per run. 0 % GC time.'
)
So, the more characters are from a single range (0-9 or a-f or A-F), as it
was probably the case with the data you benchmarked with, the faster your
variant becomes.
Levente
>
>>
>> The second one uses a non-minimal perfect hash table with an extremely simple hash function:
>>
>> readHexFrom: aStream
>> "Initialize the receiver from a hexadecimal string representation"
>>
>> | keys values |
>> keys := #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0].
>> values := #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0].
>> 1 to: self size do: [ :i |
>> | c index v1 v2 |
>> c := aStream next asInteger.
>> index := (c bitAnd: 63) + 1.
>> (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>> v1 := values at: index.
>> c := aStream next asInteger.
>> index := (c bitAnd: 63) + 1.
>> (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>> v2 := values at: index.
>> self at: i put: (v1 bitShift: 4) + v2 ]
>
> That’s neat, I’ll remember the technique. But what we have is more straight forward.
>
>> Levente
>>
>>> On Thu, 6 Aug 2020, commits at source.squeak.org wrote:
>>>
>>> Eliot Miranda uploaded a new version of Collections to project The Trunk:
>>> http://source.squeak.org/trunk/Collections-eem.907.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Collections-eem.907
>>> Author: eem
>>> Time: 6 August 2020, 12:51:46.481296 pm
>>> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70
>>> Ancestors: Collections-mt.906
>>>
>>> 2.7x faster ByteArray>>readHexFrom:
>>>
>>> =============== Diff against Collections-mt.906 ===============
>>>
>>> Item was changed:
>>> ----- Method: ByteArray>>readHexFrom: (in category 'initialize') -----
>>> readHexFrom: aStream
>>> "Initialize the receiver from a hexadecimal string representation"
>>> + + 1 to: self size do:
>>> + [:i| | n v1 v2 |
>>> + n := aStream next asInteger.
>>> + v1 := n > 57 "$9 asInteger = 57"
>>> + ifTrue:
>>> + [n > 96 "$a asInteger 97"
>>> + ifTrue: [n - 87]
>>> + ifFalse: [n > 64 "$A asInteger = 65"
>>> + ifTrue: [n - 55]
>>> + ifFalse: [-1]]] + ifFalse: [n - 48]. "$0 asInteger = 48"
>>> + (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>>> + n := aStream next asInteger.
>>> + v2 := n > 57 "$9 asInteger = 57"
>>> + ifTrue:
>>> + [n > 96 "$a asInteger 97"
>>> + ifTrue: [n - 87]
>>> + ifFalse: [n > 64 "$A asInteger = 65"
>>> + ifTrue: [n - 55]
>>> + ifFalse: [-1]]]
>>> + ifFalse: [n - 48]. "$0 asInteger = 48"
>>> + (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>>> + self at: i put: (v1 bitShift: 4) + v2]
>>> + + "Proof that our filter selects only hexadecimal characters:
>>> + (0 to: 255)
>>> + select:
>>> + [:n|
>>> + (n > 57
>>> + ifTrue:
>>> + [n > 96 + ifTrue: [n - 87]
>>> + ifFalse: [n > 64
>>> + ifTrue: [n - 55]
>>> + ifFalse: [-1]]]
>>> + ifFalse: [n - 48]) between: 0 and: 15]
>>> + thenCollect:
>>> + [:n| Character value: n]"!
>>> - | map v ch value |
>>> - map := '0123456789abcdefABCDEF'.
>>> - 1 to: self size do:[:i|
>>> - ch := aStream next.
>>> - v := (map indexOf: ch) - 1.
>>> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>>> - value := v bitShift: 4.
>>> - ch := aStream next.
>>> - v := (map indexOf: ch) - 1.
>>> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>>> - value := value + v.
>>> - self at: i put: value.
>>> - ].
>>> - !
>>
More information about the Squeak-dev
mailing list
|