Image using high CPU in WeakIdentityKeyDictionary>>scanFor:

List overview All Threads
Download

newer

older

Ottawa Smalltalk Users Group...

Re: Dictionaries broken in 3.9a

Ross Boylan

19 Sep 2005 19 Sep '05

10:26 p.m.

I continue to have major problems with squeak taking up all the CPU and becoming extraordinarily sluggish (e.g., 15 seconds to respond to click, minutes to complete some operations).

A TimeProfile while the system was doing nothing showed virtually all (99.4%) the time going to WeakIdentityDictionary>>scanFor:. This time is roughly evenly split between primitives and WeakKeyAssociation>>key.

Highpoints of the stack (sorry, there doesn't seem to be a good way to copy or fileout the whole thing) with "..." indicating omissions: PasteUpMorph>>doOneCycle ... SystemWindow>>stepAt: TimeProfileBrowser(Object)>>stepAt:in: ... TimeProfileBrowser(Object)>>updateListsAndCodeIn: ... TimeProfileBrowser(Object)>>canDiscardEdits SystemWindow(Object)>>canDiscardEdits ... SystemWindow(Object)>>myDependents WeakIdentityKeyDictionary(Dictionary)>>at:ifAbsent: WeakIdentityKeyDictionary(Set)>>findElementOrNil: WeakIdentityKeyDictionary>>scanFor

Memory activity and GC looks minimal.

Does anyone have any idea about what the deeper cause of this problem might be, or where to look. Even better if you have a solution :)

The problem with the VM taking up lots of CPU and not responding has been going on for months, though its form has been variable. Sometimes the squeak profiler shows it's using very little CPU, while the operating system tools disagree. Sometimes garbage collecting or saving the image helps. At the moment, they don't. Completely exiting the image and restarting grants a temporary respite.

Undoubtedly I need to redesign my app, but I'd like to survive until I get that done. My current plan: 1. migrate to 3.8 image 2. add database 3. see what problems remain and redesign as needed.

I think for most db's (I'm going to try Magma) require a current image, so 1 needs to come first. Also, DateAndTime is a major time sink, and 3.8 has a much zippier version.

I'm running the latest official Unix VM on Debian; I think that's basically the same as Ian's latest release. My image is roughly 3.6. Some stats:

Smalltalk vmVersion 'Squeak3.8gamma of ''24 November 2004'' [latest update: #6548]'

memory 104,027,704 bytes old 46,287,164 bytes (44.5%) young 335,604 bytes (0.3%) used 46,622,768 bytes (44.8%) free 57,404,936 bytes (55.2%) GCs 6,982 (131ms between GCs) full 0 totalling 0ms (0.0% uptime) incr 6982 totalling 28,468ms (3.0% uptime), avg 4.0ms tenures 19 (avg 367 GCs/tenure) Since last view -397,640 uptime -116345.1s full -34 totalling -15,907ms (0.0% uptime), avg 468.0ms incr -397606 totalling -346,430ms (0.0% uptime), avg tenures -167 (avg 2380 GCs/tenure)

WeakKeyAssociation allInstances size 7456 (though this last statistic is while the system was behaving OK after a quit and restart).

Ross Boylan

Show replies by date

David T. Lewis

20 Sep 20 Sep

3:04 a.m.

On Mon, Sep 19, 2005 at 01:26:46PM -0700, Ross Boylan wrote:

...

I continue to have major problems with squeak taking up all the CPU and becoming extraordinarily sluggish (e.g., 15 seconds to respond to click, minutes to complete some operations).

A TimeProfile while the system was doing nothing showed virtually all (99.4%) the time going to WeakIdentityDictionary>>scanFor:. This time is roughly evenly split between primitives and WeakKeyAssociation>>key.

This sounds rather familiar all of the sudden. I think that I have seen similar behavior associated with the WeakRegistry in class StandardFileStream (and Socket, although in my case StandardFileStream was the issue).

The problem I saw was that #scanFor: degraded horribly when there were a lot of entries in the WeakRegistry. In normal Squeak usage, this did not happen, but in my case I was accidentally accumulating a lot of entries in the registry (*).

Take a look a the WeakIdentityDictionaries in your application. I'll bet you have one or two of them with a large number of entries, and your "CPU problem" may be happening when one of those critters gets too big and you need to insert more elements into it.

I don't recall the details, but I do remember that #scanfor: was the issue. If this turns out be be the problem you are seeing, perhaps someone with a knack for algorithms could take a look at it and propose a #scanfor: that degrades more gracefully with size.

Dave

(*) Specifically, I was using a subclass of StandardFileStream to represent the end points of an operating system pipe; then I made and destroyed large numbers of these in the course of making command pipelines (see CommandShell on SqueakMap). After running a large number of SUnit tests, these things would accumulate in the registry to the point where performance degraded horribly. Once I found the culprit, I just quit registering the pipe endpoints (which of course are transient things that don't need to be registered), and my "CPU problem" went away.

Daniel Vainsencher

6:47 a.m.

Actually the scanfor of weakIDDicts in 3.9a is reasonable looking (as opposed, for example to weakset, weakkeydict, Dictionary, Set and so forth, which are quite terrible above ~3000 entries. I'm talking specifically about the calculation of "start", which really determines how well the hash values will be spread out.

Ross, can you post WeakIDDict>>scanFor: as implemented in your image? just to make sure.

Ross, my advice is: 1. I notice your time in this dictionary stems from dependent notification. Consider a switch to the local implementation of dependencies. A little more detail: Object implements all dependents using one global dictionary, that therefore might get large. Other classes have local storage for those (see implementors of actionMap). This might not help if you have one object with thousands of dependents, but might solve the problem outright otherwise. 2. Try to make a simple TestCase that exhibits the same performance issues at the same scales, that other people can load and look at. 3. As Dave said, look at the instances in your image. How many such dictionaries do you have? how big are they? what percentages of the associations in them have nil keys (careful about the code you use - associationsDo: ignores nil keys in some versions).

Daniel

David T. Lewis wrote:

...

On Mon, Sep 19, 2005 at 01:26:46PM -0700, Ross Boylan wrote:

...
I continue to have major problems with squeak taking up all the CPU and becoming extraordinarily sluggish (e.g., 15 seconds to respond to click, minutes to complete some operations).

A TimeProfile while the system was doing nothing showed virtually all (99.4%) the time going to WeakIdentityDictionary>>scanFor:. This time is roughly evenly split between primitives and WeakKeyAssociation>>key.

This sounds rather familiar all of the sudden. I think that I have seen similar behavior associated with the WeakRegistry in class StandardFileStream (and Socket, although in my case StandardFileStream was the issue).

The problem I saw was that #scanFor: degraded horribly when there were a lot of entries in the WeakRegistry. In normal Squeak usage, this did not happen, but in my case I was accidentally accumulating a lot of entries in the registry (*).

Take a look a the WeakIdentityDictionaries in your application. I'll bet you have one or two of them with a large number of entries, and your "CPU problem" may be happening when one of those critters gets too big and you need to insert more elements into it.

I don't recall the details, but I do remember that #scanfor: was the issue. If this turns out be be the problem you are seeing, perhaps someone with a knack for algorithms could take a look at it and propose a #scanfor: that degrades more gracefully with size.

Dave

(*) Specifically, I was using a subclass of StandardFileStream to represent the end points of an operating system pipe; then I made and destroyed large numbers of these in the course of making command pipelines (see CommandShell on SqueakMap). After running a large number of SUnit tests, these things would accumulate in the registry to the point where performance degraded horribly. Once I found the culprit, I just quit registering the pipe endpoints (which of course are transient things that don't need to be registered), and my "CPU problem" went away.

Ross Boylan

9:15 a.m.

Thank you both for your responses. It will take me awhile to absorb and react fully to them, but there are a couple things I can offer now.

First, WeakIdentityKeyDictionary allInstances collect: [:x | x size] #(2831 48 3588) so clearly I have some above the pain threshhold mentioned below.

On Mon, Sep 19, 2005 at 09:47:43PM -0700, Daniel Vainsencher wrote:

...

Actually the scanfor of weakIDDicts in 3.9a is reasonable looking (as opposed, for example to weakset, weakkeydict, Dictionary, Set and so forth, which are quite terrible above ~3000 entries. I'm talking specifically about the calculation of "start", which really determines how well the hash values will be spread out.

Ross, can you post WeakIDDict>>scanFor: as implemented in your image? just to make sure.

'From Squeak3.2 of 29 April 2002 [latest update: #4956] on 19 September 2005 at 11:58:11 pm'!

!WeakIdentityKeyDictionary methodsFor: 'private' stamp: 'ar 2/11/2001 01:41'! scanFor: anObject "ar 10/21/2000: The method has been copied to this location to indicate that whenever #scanFor: changes #scanForNil: must be changed in the receiver as well." "Scan the key array for the first slot containing either a nil (indicating an empty slot) or an element that matches anObject. Answer the index of that slot or zero if no slot is found. This method will be overridden in various subclasses that have different interpretations for matching elements." | element start finish | start _ (anObject identityHash \ array size) + 1. finish _ array size.

"Search from (hash mod size) to the end." start to: finish do: [:index | ((element _ array at: index) == nil or: [element key == anObject]) ifTrue: [^ index ]].

"Search from 1 to where we started." 1 to: start-1 do: [:index | ((element _ array at: index) == nil or: [element key == anObject]) ifTrue: [^ index ]].

^ 0 "No match AND no empty slot"! !

Is that OK, or should I send a gzipped version? I remember vaguely that may be useful because of line-ending differences.

More to come.

Thanks again. Ross

One follow-up question: ...

...

David T. Lewis wrote:

...

...
The problem I saw was that #scanFor: degraded horribly when there were a lot of entries in the WeakRegistry. In normal Squeak usage, this did not happen, but in my case I was accidentally accumulating a lot of entries in the registry (*).

...

...
(*) Specifically, I was using a subclass of StandardFileStream to represent the end points of an operating system pipe; then I made and destroyed large numbers of these in the course of making command pipelines (see CommandShell on SqueakMap). After running a large number of SUnit tests, these things would accumulate in the registry to the point where performance degraded horribly. Once I found the culprit, I just quit registering the pipe endpoints (which of course are transient things that don't need to be registered), and my "CPU problem" went away.

Was this registration something you did explicitly, or did it happen behind the scenes? Or perhaps I'm not understanding what you mean by register. The "registration" that I'm doing, that I'm aware of, is registering dependents. I think I'm using a mix of the old and new systems for doing so (when:send:to: and update/changed). I suspect you mean something else.

Ross

Daniel Vainsencher

12:13 p.m.

Hi Ross.

There you go -

Your scanFor: is indeed screwy. Since its an identity dictionary, your hash values are 0-4095. Therefore once array size >= 4096, each key goes directly to its own hash value, instead of being more or less evenly spread out over the array. The effect is seen easily by inspection - in that image, inspect the array, and you'll see half is very congested, and half is completely empty.

Look at the equivalent code in a modern version, you'll see some simple arithmetic spreading the 4095 values over the size of the array.

Your arrays are of an under critical size - at >4096 entries you would be effectively working with a slow linear search. At the moment, you probably occasionally get some mileage from the hash... but its still far slower than it should be.

So you should definitely think about upgrading, either the image or at least that method. Of course you should study the matter and take the appropriate precautions, and test that it really solves your problem, but strictly at the information level, don't forget to rehash very quickly after any change in the hashing scheme of any affected data structures.

Sheesh, amazing how long lived this bug is... if someone has some time and the inclination to finally put a stake through it, I have some advice.

Daniel

Ross Boylan wrote:

...

Thank you both for your responses. It will take me awhile to absorb and react fully to them, but there are a couple things I can offer now.

First, WeakIdentityKeyDictionary allInstances collect: [:x | x size] #(2831 48 3588) so clearly I have some above the pain threshhold mentioned below.

On Mon, Sep 19, 2005 at 09:47:43PM -0700, Daniel Vainsencher wrote:

...
Actually the scanfor of weakIDDicts in 3.9a is reasonable looking (as opposed, for example to weakset, weakkeydict, Dictionary, Set and so forth, which are quite terrible above ~3000 entries. I'm talking specifically about the calculation of "start", which really determines how well the hash values will be spread out.

Ross, can you post WeakIDDict>>scanFor: as implemented in your image? just to make sure.

'From Squeak3.2 of 29 April 2002 [latest update: #4956] on 19 September 2005 at 11:58:11 pm'!

!WeakIdentityKeyDictionary methodsFor: 'private' stamp: 'ar 2/11/2001 01:41'! scanFor: anObject "ar 10/21/2000: The method has been copied to this location to indicate that whenever #scanFor: changes #scanForNil: must be changed in the receiver as well." "Scan the key array for the first slot containing either a nil (indicating an empty slot) or an element that matches anObject. Answer the index of that slot or zero if no slot is found. This method will be overridden in various subclasses that have different interpretations for matching elements." | element start finish | start _ (anObject identityHash \ array size) + 1. finish _ array size.

"Search from (hash mod size) to the end." start to: finish do: [:index | ((element _ array at: index) == nil or: [element key == anObject]) ifTrue: [^ index ]].

"Search from 1 to where we started." 1 to: start-1 do: [:index | ((element _ array at: index) == nil or: [element key == anObject]) ifTrue: [^ index ]].

^ 0 "No match AND no empty slot"! !

Is that OK, or should I send a gzipped version? I remember vaguely that may be useful because of line-ending differences.

More to come.

Thanks again. Ross

One follow-up question: ...

...
David T. Lewis wrote:

...

...
...
The problem I saw was that #scanFor: degraded horribly when there were a lot of entries in the WeakRegistry. In normal Squeak usage, this did not happen, but in my case I was accidentally accumulating a lot of entries in the registry (*).

...

...
...
(*) Specifically, I was using a subclass of StandardFileStream to represent the end points of an operating system pipe; then I made and destroyed large numbers of these in the course of making command pipelines (see CommandShell on SqueakMap). After running a large number of SUnit tests, these things would accumulate in the registry to the point where performance degraded horribly. Once I found the culprit, I just quit registering the pipe endpoints (which of course are transient things that don't need to be registered), and my "CPU problem" went away.

Was this registration something you did explicitly, or did it happen behind the scenes? Or perhaps I'm not understanding what you mean by register. The "registration" that I'm doing, that I'm aware of, is registering dependents. I think I'm using a mix of the old and new systems for doing so (when:send:to: and update/changed). I suspect you mean something else.

Ross

stéphane ducasse

6:57 p.m.

daniel can you an entry in mantis for this one... Thanks

On 20 sept. 05, at 12:13, Daniel Vainsencher wrote:

...

Hi Ross.

There you go -

Your scanFor: is indeed screwy. Since its an identity dictionary, your hash values are 0-4095. Therefore once array size >= 4096, each key goes directly to its own hash value, instead of being more or less evenly spread out over the array. The effect is seen easily by inspection - in that image, inspect the array, and you'll see half is very congested, and half is completely empty.

Look at the equivalent code in a modern version, you'll see some simple arithmetic spreading the 4095 values over the size of the array.

Your arrays are of an under critical size - at >4096 entries you would be effectively working with a slow linear search. At the moment, you probably occasionally get some mileage from the hash... but its still far slower than it should be.

So you should definitely think about upgrading, either the image or at least that method. Of course you should study the matter and take the appropriate precautions, and test that it really solves your problem, but strictly at the information level, don't forget to rehash very quickly after any change in the hashing scheme of any affected data structures.

Sheesh, amazing how long lived this bug is... if someone has some time and the inclination to finally put a stake through it, I have some advice.

Daniel

Ross Boylan wrote:

...
Thank you both for your responses. It will take me awhile to absorb and react fully to them, but there are a couple things I can offer now. First, WeakIdentityKeyDictionary allInstances collect: [:x | x size] #(2831 48 3588) so clearly I have some above the pain threshhold mentioned below. On Mon, Sep 19, 2005 at 09:47:43PM -0700, Daniel Vainsencher wrote:

...
Actually the scanfor of weakIDDicts in 3.9a is reasonable looking (as opposed, for example to weakset, weakkeydict, Dictionary, Set and so forth, which are quite terrible above ~3000 entries. I'm talking specifically about the calculation of "start", which really determines how well the hash values will be spread out.

Ross, can you post WeakIDDict>>scanFor: as implemented in your image? just to make sure.

'From Squeak3.2 of 29 April 2002 [latest update: #4956] on 19 September 2005 at 11:58:11 pm'! !WeakIdentityKeyDictionary methodsFor: 'private' stamp: 'ar 2/11/2001 01:41'! scanFor: anObject "ar 10/21/2000: The method has been copied to this location to indicate that whenever #scanFor: changes #scanForNil: must be changed in the receiver as well." "Scan the key array for the first slot containing either a nil (indicating an empty slot) or an element that matches anObject. Answer the index of that slot or zero if no slot is found. This method will be overridden in various subclasses that have different interpretations for matching elements." | element start finish | start _ (anObject identityHash \ array size) + 1. finish _ array size. "Search from (hash mod size) to the end." start to: finish do: [:index | ((element _ array at: index) == nil or: [element key == anObject]) ifTrue: [^ index ]]. "Search from 1 to where we started." 1 to: start-1 do: [:index | ((element _ array at: index) == nil or: [element key == anObject]) ifTrue: [^ index ]]. ^ 0 "No match AND no empty slot"! ! Is that OK, or should I send a gzipped version? I remember vaguely that may be useful because of line-ending differences. More to come. Thanks again. Ross One follow-up question: ...

...
David T. Lewis wrote:

...

...
...
The problem I saw was that #scanFor: degraded horribly when there were a lot of entries in the WeakRegistry. In normal Squeak usage, this did not happen, but in my case I was accidentally accumulating a lot of entries in the registry (*).

...

...
...
(*) Specifically, I was using a subclass of StandardFileStream to represent the end points of an operating system pipe; then I made and destroyed large numbers of these in the course of making command pipelines (see CommandShell on SqueakMap). After running a large number of SUnit tests, these things would accumulate in the registry to the point where performance degraded horribly. Once I found the culprit, I just quit registering the pipe endpoints (which of course are transient things that don't need to be registered), and my "CPU problem" went away.

Was this registration something you did explicitly, or did it happen behind the scenes? Or perhaps I'm not understanding what you mean by register. The "registration" that I'm doing, that I'm aware of, is registering dependents. I think I'm using a mix of the old and new systems for doing so (when:send:to: and update/changed). I suspect you mean something else. Ross

Ross Boylan

9:26 p.m.

On Tue, Sep 20, 2005 at 03:13:28AM -0700, Daniel Vainsencher wrote:

...

Hi Ross.

There you go -

Your scanFor: is indeed screwy. Since its an identity dictionary, your hash values are 0-4095. Therefore once array size >= 4096, each key goes directly to its own hash value, instead of being more or less evenly spread out over the array.

My array size (though not the contents size) is already over the limit: WeakIdentityKeyDictionary allInstances collect: [:x | x capacity] #(24576 192 12288)

...

The effect is seen easily by inspection - in that image, inspect the array, and you'll see half is very congested, and half is completely empty.

When I tried to open an inspector on Object's class var EventsField it took so long I had to kill it--not sure if that leads to some kind of loop. But if hashes are < 4096 (I didn't realize they were so small), things are clearly in bad shape.

...

Look at the equivalent code in a modern version, you'll see some simple arithmetic spreading the 4095 values over the size of the array.

In 3.8 final there's this: finish _ array size. finish > 4096 ifTrue: [hash _ anObject identityHash * (finish // 4096)] ifFalse: [hash _ anObject identityHash]. start _ (hash \ array size) + 1.

...

Your arrays are of an under critical size - at >4096 entries you would be effectively working with a slow linear search. At the moment, you probably occasionally get some mileage from the hash... but its still far slower than it should be.

So you should definitely think about upgrading, either the image or at

Upgrading the image is a big project, which I've started but will not soon finish :(.

...

least that method. Of course you should study the matter and take the appropriate precautions, and test that it really solves your problem, but strictly at the information level, don't forget to rehash very quickly after any change in the hashing scheme of any affected data structures.

Should the two methods be sufficient to change? This looks like a hazardous operation. I have a process which tickles the image every few seconds and cause updates which would use some of these dictionaries. I'll suspend that process, but the operation looks very thread-unsafe.

I'm sure this won't end my performance problems, but I hope it will help.

...

Sheesh, amazing how long lived this bug is... if someone has some time and the inclination to finally put a stake through it, I have some advice.

I think the bug is dead, so nothing needs to filed about it. Recall that the image with the problem is basically at 3.6 final. The 3.8 final has the improved code quoted above.

Daniel Vainsencher

11:11 p.m.

Ross Boylan wrote:

...

...
least that method. Of course you should study the matter and take the appropriate precautions, and test that it really solves your problem, but strictly at the information level, don't forget to rehash very quickly after any change in the hashing scheme of any affected data structures.

Should the two methods be sufficient to change?

I can't look into your case specifically enough to answer that. I'm just giving you a few pointers so you can shoot yourself in the foot fewer times, more quickly. Sorry :-)

But a protocol browser might be helpful. 3.9a has a #scanForNil: that should probably match.

...

This looks like a hazardous operation. I have a process which tickles the image every few seconds and cause updates which would use some of these dictionaries. I'll suspend that process, but the operation looks very thread-unsafe.

#valueUnpreemptively might be helpful.

Also, you can change the methods so that they work with dictionaries with new and old type hashes. To do this, search using the new start, if you fail, before returning, search again using the old start.

...

I'm sure this won't end my performance problems, but I hope it will help.

Maybe you need an experiment to determine whether this gives you enough improvement to be worth it before you decide if/how to do it safely...

...

...
Sheesh, amazing how long lived this bug is... if someone has some time and the inclination to finally put a stake through it, I have some advice.

I think the bug is dead, so nothing needs to filed about it. Recall that the image with the problem is basically at 3.6 final. The 3.8 final has the improved code quoted above.

Actually, many non-identity dictionaries (which therefore do not have the arithmetic for start) actually use identityHash (it is the default), so I would not consider it dead at all...

Maybe using a chained implementation as is being discussed is the real solution.

Daniel

David T. Lewis

12:48 p.m.

On Tue, Sep 20, 2005 at 12:15:15AM -0700, Ross Boylan wrote:

...

One follow-up question: ...

...
David T. Lewis wrote:

...

...
...
The problem I saw was that #scanFor: degraded horribly when there were a lot of entries in the WeakRegistry. In normal Squeak usage, this did not happen, but in my case I was accidentally accumulating a lot of entries in the registry (*).

...

...
...
(*) Specifically, I was using a subclass of StandardFileStream to represent the end points of an operating system pipe; then I made and destroyed large numbers of these in the course of making command pipelines (see CommandShell on SqueakMap). After running a large number of SUnit tests, these things would accumulate in the registry to the point where performance degraded horribly. Once I found the culprit, I just quit registering the pipe endpoints (which of course are transient things that don't need to be registered), and my "CPU problem" went away.

Was this registration something you did explicitly, or did it happen behind the scenes? Or perhaps I'm not understanding what you mean by register. The "registration" that I'm doing, that I'm aware of, is registering dependents. I think I'm using a mix of the old and new systems for doing so (when:send:to: and update/changed). I suspect you mean something else.

It was explicit, as opposed to something done with when:send:to: or update/changed. This is just the normal behavior of StandardFileStream (StandardFileStream class>>register). In OSProcess and CommandShell it is easy to create and destroy lots of AttachableFileStreams as OS pipes are opened and closed, and these get registered in the WeakRegistry of class StandardFileStream.

This does not cause a noticable performance problem most of the time, but if I recall right I had a bug in my code in which I was failing to close some pipe connections. After these started to accumulate in the registry, performance degraded badly. I don't remember for sure, but I think it took only a relatively small number of entries (perhaps hundreds, not thousands) for this to be a problem.

I'm telling you this from memory, and it happened a few years ago, so you should not trust the details of what I'm saying. But if you have some collections like this that contain thousands of entries, you may be encountering a similar issue.

Dave

6846

Age (days ago)

6847

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

8 comments

4 participants

tags (0)

participants (4)

Daniel Vainsencher
David T. Lewis
Ross Boylan
stéphane ducasse