Image using high CPU in WeakIdentityKeyDictionary>>scanFor:

Tue Sep 20 10:13:28 UTC 2005

Hi Ross.

There you go -

Your scanFor: is indeed screwy. Since its an identity dictionary, your 
hash values are 0-4095. Therefore once array size >= 4096, each key goes 
directly to its own hash value, instead of being more or less evenly 
spread out over the array.
The effect is seen easily by inspection - in that image, inspect the 
array, and you'll see half is very congested, and half is completely empty.

Look at the equivalent code in a modern version, you'll see some simple 
arithmetic spreading the 4095 values over the size of the array.

Your arrays are of an under critical size - at >4096 entries you would 
be effectively working with a slow linear search. At the moment, you 
probably occasionally get some mileage from the hash... but its still 
far slower than it should be.

So you should definitely think about upgrading, either the image or at 
least that method. Of course you should study the matter and take the 
appropriate precautions, and test that it really solves your problem, 
but strictly at the information level, don't forget to rehash very 
quickly after any change in the hashing scheme of any affected data 
structures.

Sheesh, amazing how long lived this bug is... if someone has some time 
and the inclination to finally put a stake through it, I have some advice.

Daniel

Ross Boylan wrote:
> Thank you both for your responses.  It will take me awhile to absorb
> and react fully to them, but there are a couple things I can offer
> now.
> 
> First,
> WeakIdentityKeyDictionary allInstances collect: [:x | x size]
>  #(2831 48 3588)
> so clearly I have some above the pain threshhold mentioned below.
> 
> On Mon, Sep 19, 2005 at 09:47:43PM -0700, Daniel Vainsencher wrote:
> 
>>Actually the scanfor of weakIDDicts in 3.9a is reasonable looking (as 
>>opposed, for example to weakset, weakkeydict, Dictionary, Set and so 
>>forth, which are quite terrible above ~3000 entries. I'm talking 
>>specifically about the calculation of "start", which really determines 
>>how well the hash values will be spread out.
>>
>>Ross, can you post WeakIDDict>>scanFor: as implemented in your image? 
>>just to make sure.
>>
> 
> 'From Squeak3.2 of 29 April 2002 [latest update: #4956] on 19 September 2005 at 11:58:11 pm'!
> 
> !WeakIdentityKeyDictionary methodsFor: 'private' stamp: 'ar 2/11/2001 01:41'!
> scanFor: anObject
> 	"ar 10/21/2000: The method has been copied to this location to indicate that whenever #scanFor: changes #scanForNil: must be changed in the receiver as well."
> 	"Scan the key array for the first slot containing either a nil (indicating an empty slot) or an element that matches anObject. Answer the index of that slot or zero if no slot is found. This method will be overridden in various subclasses that have different interpretations for matching elements."
> 	| element start finish |
> 	start _ (anObject identityHash \\ array size) + 1.
> 	finish _ array size.
> 
> 	"Search from (hash mod size) to the end."
> 	start to: finish do:
> 		[:index | ((element _ array at: index) == nil or: [element key == anObject])
> 			ifTrue: [^ index ]].
> 
> 	"Search from 1 to where we started."
> 	1 to: start-1 do:
> 		[:index | ((element _ array at: index) == nil or: [element key == anObject])
> 			ifTrue: [^ index ]].
> 
> 	^ 0  "No match AND no empty slot"! !
> 
> 
> Is that OK, or should I send a gzipped version?  I remember vaguely
> that may be useful because of line-ending differences.
> 
> More to come.
> 
> Thanks again.
> Ross
> 
> One follow-up question:
> ...
> 
>>David T. Lewis wrote:
> 
> ...
> 
>>>The problem I saw was that #scanFor: degraded horribly when there
>>>were a lot of entries in the WeakRegistry. In normal Squeak usage,
>>>this did not happen, but in my case I was accidentally accumulating
>>>a lot of entries in the registry (*).
> 
> ...
> 
>>>(*) Specifically, I was using a subclass of StandardFileStream to
>>>represent the end points of an operating system pipe; then I made
>>>and destroyed large numbers of these in the course of making command
>>>pipelines (see CommandShell on SqueakMap). After running a large
>>>number of SUnit tests, these things would accumulate in the registry
>>>to the point where performance degraded horribly. Once I found the
>>>culprit, I just quit registering the pipe endpoints (which of course
>>>are transient things that don't need to be registered), and my
>>>"CPU problem" went away.
> 
> 
> Was this registration something you did explicitly, or did it happen
> behind the scenes?  Or perhaps I'm not understanding what you mean by
> register.  The "registration" that I'm doing, that I'm aware of, is
> registering dependents.  I think I'm using a mix of the old and new
> systems for doing so (when:send:to: and update/changed).  I suspect
> you mean something else.
> 
> Ross
>