[squeak-dev] Association >> #hash performance

Levente Uzonyi leves at caesar.elte.hu
Thu May 26 21:32:12 UTC 2022

Hi Christoph,

On Thu, 26 May 2022, christoph.thiede at student.hpi.uni-potsdam.de wrote:

> Hi all!
> while playing around with code simulation recently, I found out that the following dictionary lookup consumes extremely many bytecodes:
>     UserInterfaceTheme current get: #return for: SHTextStylerST80.
> Dictionary >> #scanFor: takes 59 (sic!) attempts to find the right item in the dictionary. In other words, almost 20% of the property table is traversed, which does not conform my idea of a hashtable.
> The reason for this can be found in Association >> #hash: Since Collections-ar.304, it does not take the value of an association into consideration for the hash any longer. As UserInterfaceTheme uses Associations as keys in
> the Dictionary, we have a lot of "hash misses". Even though you might not notice this in practice, I noticed this indeed during debugging und simulating, I can tell you. :-)
> On the other hand, reverting Association >> #hash to honor the value slows down Unicode class compileAll indeed (significantly: 60 ms vs. 2100 ms), as noted in Andreas' original comment:
>     Name: Collections-ar.304
>     Author: ar
>     Time: 11 February 2010, 11:52:58.959 pm
>     UUID: 9e97b360-adf1-f745-8f8c-562aa08d566e
>     Ancestors: Collections-ar.303
>     Fix an age-old bug in Association>>hash which causes extreme slowdowns in compiling Unicode methods. Association>>hash does not need to hash the value; it's slow and useless.
> So my question is: What would be the right fix for this situation? Is it
> a) Do not use Associations in Dictionarys when its value matters for the lookup; or
> b) Do only use Associations in Dictionarys when its value maters for the lookup?

It's the user's (the developer who decided to use a hashed collection) 
responsibility to provide keys that have a good hash distribution.
If it's not possible with the default hash function, a PluggableDictionary 
should be used with a better one.

In this case, however, I would use Arrays as tuples instead of 
Associations because not all the keys are Associations in that dictionary, 
and Array's #hash is good enough here.
Changing Association >> #hash would have too many side effects for such a 
localized problem to fix.

> For a), the dictionary layout of UserInterfaceTheme needs to be changed, for which I am attaching a changeset that replaces Associations by Arrays there.* For b), the layout of litIndSet in Encoder needs to be changed, for
> instance by using Bindings instead of Associations for classPool entries. However, I'm not familiar enough with Encoder etc. to answer this question.
> *The changeset will automatically update all UserInterfaceTheme instances. Benchmark:
>     theme := UserInterfaceTheme current.
>     random := Random seed: 20220526.
>     properties := (1 to: 10000) collect: [:i | theme properties keys atRandom: random].
>     [properties collect: [:ea | theme get: ea]] bench.
> Old: 45 ms | New: 10 ms
> Looking forward to your opinion: How should we treat Associations in Dictionarys? Of course, this is not release-critical. :-)
> Best,
> Christoph
> =============== Postscript (accelerateUserInterfaceTheme.2.cs) ===============
> "Postscript: Update UserInterfaceTheme instances"
> UserInterfaceTheme allSubInstancesDo: [:theme |
>     | newProperties |
>     newProperties := theme properties copyEmpty.
>     theme properties keysAndValuesDo: [:key :value |
>         newProperties
>             at: ((key isKindOf: Association)
>                 ifTrue: [{key key. key value}]
>                 ifFalse: [key])
>             put: value].
>     theme instVarNamed: 'properties' put: newProperties].

There's a method named #atomicUpdate:. I'm not sure if it's needed here, 
but the name suggests that it's better to use that to update the 
properties. The rest of the changes looks good to me. +1


More information about the Squeak-dev mailing list