[squeakdev] 32 vs 64 bits and large integer hash
Levente Uzonyi
leves at caesar.elte.hu
Thu Nov 22 09:59:23 UTC 2018
On Thu, 22 Nov 2018, Tobias Pape wrote:
>
>> On 22.11.2018, at 01:15, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>>
>> Why not do it the other way around and implement SmallInteger >> #hash as a primitive with the following fallback code?
>>
>> SmallInteger >> #hash
>>
>> <primitive: XXX>
>>  remainder hash 
>> self < 0
>> ifTrue: [
>> remainder := 0  self.
>> hash := LargeNegativeInteger hash ]
>> ifFalse: [
>> remainder := self.
>> hash := LargePositiveInteger hash ].
>> [ remainder > 0 ] whileTrue: [
>> hash := (hash + (remainder bitAnd: 16rFF)) hashMultiply.
>> remainder := remainder bitShift: 8 ].
>> ^hash
>>
>> The only problem to solve is the calculation of the initial hash value.
>> The VM has to know the initial hash value (e.g. it could be a constant based on the sign of the receiver) or it has to know how to calculate it (but that's currently done differently among forks) or the value has to be an argument of the primitive.
>>
>> Levente
>
> I'm a bit puzzled. I thought (small)Integers being their own hash is a good thing?
I would call it simple but not necessarily good.
The problem with it is that consecutive numbers generate long chains in
HashedCollections:
a := (1 to: 1000) asArray.
s := Set withAll: a.
[ 1 to: 1000000 do: [ :each  s includes: each ] ] timeToRun.
"==> 7014"
The solution in Squeak is to use PluggableSet instead of Set,
because it applies #hashMultiply on the hash value:
ps := PluggableSet integerSet.
ps addAll: a.
[ 1 to: 1000000 do: [ :each  ps includes: each ] ] timeToRun.
"==> 95"
IIRC in Pharo SmallInteger's hash is based on #hashMultiply to avoid the
long chains. That was probably the main reason for the push to make
#hashMultply a numbered primitive.
Levente
>
> Best regards
> Tobias
>
>>
>> P.S.: This is another case where Behavior >> #hash bites Squeak and causes additional slowdown.
>> P.P.S.: Float >> #hash should use #bitXor: and #hashMultiply instead of #+ and #bitShift:
>>
>> On Wed, 21 Nov 2018, Eliot Miranda wrote:
>>
>>> Hi All,
>>> right now we have the following definition of Large(Positive)Integer>>hash:
>>> hash
>>> ^ByteArray hashBytes: self startingWith: self species hash
>>> which means that for all integers outside of the 32bit SmallInteger range (2 ^ 30 to 2 ^ 30  1), the 32bit system and the 64bit system answer different values for hash.
>>> e.g. in 64 bits: (2 raisedTo: 30) hash 1073741824
>>> but in 32 bits: (2 raisedTo: 30) hash 230045764
>>> This is unsatisfactory. I propose changing Large(Positive)Integer>>hash to
>>> hash
>>> ^self digitLength <= 8
>>> ifTrue: [self]
>>> ifFalse: [ByteArray hashBytes: self startingWith: self species hash]
>>> P.S. Note that this will not break Float hash, which is defined as
>>> Float>>hash
>>> "Hash is reimplemented because = is implemented. Both words of the float are used. (The bitShift:'s ensure that the intermediate results do not become a large integer.) Care is taken to answer same
>>> hash as an equal Integer."
>>> (self isFinite and: [self fractionPart = 0.0]) ifTrue: [^self truncated hash].
>>> ^ ((self basicAt: 1) bitShift: 4) +
>>> ((self basicAt: 2) bitShift: 4)
>>> P.P.S. I *think* that "(self isFinite and: [self fractionPart = 0.0])" is equivalent to "self  self = self fractionPart" ;)
>>> _,,,^..^,,,_
>>> best, Eliot
>>
More information about the Squeakdev
mailing list
