Image using high CPU in WeakIdentityKeyDictionary>>scanFor:

Tue Sep 20 16:57:07 UTC 2005

daniel can you an entry in mantis for this one...
Thanks

On 20 sept. 05, at 12:13, Daniel Vainsencher wrote:

> Hi Ross.
>
> There you go -
>
> Your scanFor: is indeed screwy. Since its an identity dictionary,  
> your hash values are 0-4095. Therefore once array size >= 4096,  
> each key goes directly to its own hash value, instead of being more  
> or less evenly spread out over the array.
> The effect is seen easily by inspection - in that image, inspect  
> the array, and you'll see half is very congested, and half is  
> completely empty.
>
> Look at the equivalent code in a modern version, you'll see some  
> simple arithmetic spreading the 4095 values over the size of the  
> array.
>
> Your arrays are of an under critical size - at >4096 entries you  
> would be effectively working with a slow linear search. At the  
> moment, you probably occasionally get some mileage from the hash...  
> but its still far slower than it should be.
>
> So you should definitely think about upgrading, either the image or  
> at least that method. Of course you should study the matter and  
> take the appropriate precautions, and test that it really solves  
> your problem, but strictly at the information level, don't forget  
> to rehash very quickly after any change in the hashing scheme of  
> any affected data structures.
>
> Sheesh, amazing how long lived this bug is... if someone has some  
> time and the inclination to finally put a stake through it, I have  
> some advice.
>
> Daniel
>
> Ross Boylan wrote:
>
>> Thank you both for your responses.  It will take me awhile to absorb
>> and react fully to them, but there are a couple things I can offer
>> now.
>> First,
>> WeakIdentityKeyDictionary allInstances collect: [:x | x size]
>>  #(2831 48 3588)
>> so clearly I have some above the pain threshhold mentioned below.
>> On Mon, Sep 19, 2005 at 09:47:43PM -0700, Daniel Vainsencher wrote:
>>
>>> Actually the scanfor of weakIDDicts in 3.9a is reasonable looking  
>>> (as opposed, for example to weakset, weakkeydict, Dictionary, Set  
>>> and so forth, which are quite terrible above ~3000 entries. I'm  
>>> talking specifically about the calculation of "start", which  
>>> really determines how well the hash values will be spread out.
>>>
>>> Ross, can you post WeakIDDict>>scanFor: as implemented in your  
>>> image? just to make sure.
>>>
>>>
>> 'From Squeak3.2 of 29 April 2002 [latest update: #4956] on 19  
>> September 2005 at 11:58:11 pm'!
>> !WeakIdentityKeyDictionary methodsFor: 'private' stamp: 'ar  
>> 2/11/2001 01:41'!
>> scanFor: anObject
>>     "ar 10/21/2000: The method has been copied to this location to  
>> indicate that whenever #scanFor: changes #scanForNil: must be  
>> changed in the receiver as well."
>>     "Scan the key array for the first slot containing either a nil  
>> (indicating an empty slot) or an element that matches anObject.  
>> Answer the index of that slot or zero if no slot is found. This  
>> method will be overridden in various subclasses that have  
>> different interpretations for matching elements."
>>     | element start finish |
>>     start _ (anObject identityHash \\ array size) + 1.
>>     finish _ array size.
>>     "Search from (hash mod size) to the end."
>>     start to: finish do:
>>         [:index | ((element _ array at: index) == nil or: [element  
>> key == anObject])
>>             ifTrue: [^ index ]].
>>     "Search from 1 to where we started."
>>     1 to: start-1 do:
>>         [:index | ((element _ array at: index) == nil or: [element  
>> key == anObject])
>>             ifTrue: [^ index ]].
>>     ^ 0  "No match AND no empty slot"! !
>> Is that OK, or should I send a gzipped version?  I remember vaguely
>> that may be useful because of line-ending differences.
>> More to come.
>> Thanks again.
>> Ross
>> One follow-up question:
>> ...
>>
>>> David T. Lewis wrote:
>>>
>> ...
>>
>>>> The problem I saw was that #scanFor: degraded horribly when there
>>>> were a lot of entries in the WeakRegistry. In normal Squeak usage,
>>>> this did not happen, but in my case I was accidentally accumulating
>>>> a lot of entries in the registry (*).
>>>>
>> ...
>>
>>>> (*) Specifically, I was using a subclass of StandardFileStream to
>>>> represent the end points of an operating system pipe; then I made
>>>> and destroyed large numbers of these in the course of making  
>>>> command
>>>> pipelines (see CommandShell on SqueakMap). After running a large
>>>> number of SUnit tests, these things would accumulate in the  
>>>> registry
>>>> to the point where performance degraded horribly. Once I found the
>>>> culprit, I just quit registering the pipe endpoints (which of  
>>>> course
>>>> are transient things that don't need to be registered), and my
>>>> "CPU problem" went away.
>>>>
>> Was this registration something you did explicitly, or did it happen
>> behind the scenes?  Or perhaps I'm not understanding what you mean by
>> register.  The "registration" that I'm doing, that I'm aware of, is
>> registering dependents.  I think I'm using a mix of the old and new
>> systems for doing so (when:send:to: and update/changed).  I suspect
>> you mean something else.
>> Ross
>>
>
>