Image using high CPU in WeakIdentityKeyDictionary>>scanFor:
stéphane ducasse
ducasse at iam.unibe.ch
Tue Sep 20 16:57:07 UTC 2005
daniel can you an entry in mantis for this one...
Thanks
On 20 sept. 05, at 12:13, Daniel Vainsencher wrote:
> Hi Ross.
>
> There you go -
>
> Your scanFor: is indeed screwy. Since its an identity dictionary,
> your hash values are 0-4095. Therefore once array size >= 4096,
> each key goes directly to its own hash value, instead of being more
> or less evenly spread out over the array.
> The effect is seen easily by inspection - in that image, inspect
> the array, and you'll see half is very congested, and half is
> completely empty.
>
> Look at the equivalent code in a modern version, you'll see some
> simple arithmetic spreading the 4095 values over the size of the
> array.
>
> Your arrays are of an under critical size - at >4096 entries you
> would be effectively working with a slow linear search. At the
> moment, you probably occasionally get some mileage from the hash...
> but its still far slower than it should be.
>
> So you should definitely think about upgrading, either the image or
> at least that method. Of course you should study the matter and
> take the appropriate precautions, and test that it really solves
> your problem, but strictly at the information level, don't forget
> to rehash very quickly after any change in the hashing scheme of
> any affected data structures.
>
> Sheesh, amazing how long lived this bug is... if someone has some
> time and the inclination to finally put a stake through it, I have
> some advice.
>
> Daniel
>
> Ross Boylan wrote:
>
>> Thank you both for your responses. It will take me awhile to absorb
>> and react fully to them, but there are a couple things I can offer
>> now.
>> First,
>> WeakIdentityKeyDictionary allInstances collect: [:x | x size]
>> #(2831 48 3588)
>> so clearly I have some above the pain threshhold mentioned below.
>> On Mon, Sep 19, 2005 at 09:47:43PM -0700, Daniel Vainsencher wrote:
>>
>>> Actually the scanfor of weakIDDicts in 3.9a is reasonable looking
>>> (as opposed, for example to weakset, weakkeydict, Dictionary, Set
>>> and so forth, which are quite terrible above ~3000 entries. I'm
>>> talking specifically about the calculation of "start", which
>>> really determines how well the hash values will be spread out.
>>>
>>> Ross, can you post WeakIDDict>>scanFor: as implemented in your
>>> image? just to make sure.
>>>
>>>
>> 'From Squeak3.2 of 29 April 2002 [latest update: #4956] on 19
>> September 2005 at 11:58:11 pm'!
>> !WeakIdentityKeyDictionary methodsFor: 'private' stamp: 'ar
>> 2/11/2001 01:41'!
>> scanFor: anObject
>> "ar 10/21/2000: The method has been copied to this location to
>> indicate that whenever #scanFor: changes #scanForNil: must be
>> changed in the receiver as well."
>> "Scan the key array for the first slot containing either a nil
>> (indicating an empty slot) or an element that matches anObject.
>> Answer the index of that slot or zero if no slot is found. This
>> method will be overridden in various subclasses that have
>> different interpretations for matching elements."
>> | element start finish |
>> start _ (anObject identityHash \\ array size) + 1.
>> finish _ array size.
>> "Search from (hash mod size) to the end."
>> start to: finish do:
>> [:index | ((element _ array at: index) == nil or: [element
>> key == anObject])
>> ifTrue: [^ index ]].
>> "Search from 1 to where we started."
>> 1 to: start-1 do:
>> [:index | ((element _ array at: index) == nil or: [element
>> key == anObject])
>> ifTrue: [^ index ]].
>> ^ 0 "No match AND no empty slot"! !
>> Is that OK, or should I send a gzipped version? I remember vaguely
>> that may be useful because of line-ending differences.
>> More to come.
>> Thanks again.
>> Ross
>> One follow-up question:
>> ...
>>
>>> David T. Lewis wrote:
>>>
>> ...
>>
>>>> The problem I saw was that #scanFor: degraded horribly when there
>>>> were a lot of entries in the WeakRegistry. In normal Squeak usage,
>>>> this did not happen, but in my case I was accidentally accumulating
>>>> a lot of entries in the registry (*).
>>>>
>> ...
>>
>>>> (*) Specifically, I was using a subclass of StandardFileStream to
>>>> represent the end points of an operating system pipe; then I made
>>>> and destroyed large numbers of these in the course of making
>>>> command
>>>> pipelines (see CommandShell on SqueakMap). After running a large
>>>> number of SUnit tests, these things would accumulate in the
>>>> registry
>>>> to the point where performance degraded horribly. Once I found the
>>>> culprit, I just quit registering the pipe endpoints (which of
>>>> course
>>>> are transient things that don't need to be registered), and my
>>>> "CPU problem" went away.
>>>>
>> Was this registration something you did explicitly, or did it happen
>> behind the scenes? Or perhaps I'm not understanding what you mean by
>> register. The "registration" that I'm doing, that I'm aware of, is
>> registering dependents. I think I'm using a mix of the old and new
>> systems for doing so (when:send:to: and update/changed). I suspect
>> you mean something else.
>> Ross
>>
>
>
More information about the Squeak-dev
mailing list
|