[squeak-dev] Re: Process bug introduced in 3.10

bryce at kampjes.demon.co.uk bryce at kampjes.demon.co.uk
Sun Apr 27 22:18:50 UTC 2008


Andreas Raab writes:
 > bryce at kampjes.demon.co.uk wrote:
 > > I'm not convinced this is a new bug with 3.10. It feels similar to
 > > something I've encountered earlier. It is happening frequently in
 > > 3.10 but not in 3.9.
 > 
 > Can you at least describe the kind of problem you are seeing? It seems 
 > to me that we're all just fishing in the dark because you're just saying 
 > "there is a problem somewhere" but nobody knows what they're looking for 
 > so Igors and my posts are all perfectly applicable to what we observe 
 > but may have nothing to do with what you're looking for.

Squeak stops responding, Alt-. will not recover. It's consuming
100% of the CPU but is spending a decent amount of time in the
idle loop. Alt-. may occasionally pop up a debugger but normally
the image becomes completely unresponsive. printAllStacks has
shown over 80 profiling threads running which indicates that
they are not getting terminated.

Success for that test is not crashing. 

Consuming 100% of a CPU's time may be due to a single profiling
thread. Even though the tread consumes very little CPU, it consumes a
little in each tick which leads to misleading time reporting.

It locks up while running the line:
   processes do: [:each| each terminate].

This can be shown either by using Exupery's logging (which uses a
primitive to serialise and write to the file), or a simple
Transcript>>show: which is executing in the controlling UI thread,
not the forked threads.

It may be helpful if someone else tries to reproduce give the last
version using TestProfiler.

The original bug was Exupery's test suite started occasionally locking
up the image. The bug reproduces after commenting out Exupery's
compilation. I then tried to reproduce in a vanilla 3.10 image, and
messed up, which you and Igor caught. The TestProfiler reproduction is
Exupery's test with the tallying code and compilation removed to
reduce the size of the test.The "100 timesRepeat:" is to ensure that
the lock-up happens, the actual test is only run once but doesn't lock
up every time.

The combination of an image locking up while consuming 100% of the CPU
while in the idle loop is something that I've seen before but not
recently. If I then demonstrated that the bug was not in Exupery, I'd
have deleted the test. Now, Exupery is much more reliable, and the
test has been running regularly since September 2007. The reason
to mention that I may have seen this before is because the bug may
not be new to 3.10, it may just be triggered differently.

I'm going to keep digging to find out what's going on.

Bryce



More information about the Squeak-dev mailing list