6293LowSpaceWatcherFix-dtl considered harmful

David T. Lewis lewis at mail.msen.com
Thu Mar 31 12:07:23 UTC 2005


Hi Andreas,

I am the source of this patch.

If you have a copy of BFAV available, the reviews of this issue
are under the "update" tab with the subject line:
 "[bug][FIX] stack overflow crashes Squeak. (one-line fix attached)"

In case you do not have a working BFAV handy, I am attaching a
copy of the change set that I used originally to debug the problem
(runs on Unix, sorry, but you'll get the idea). I'm also copying
some of the relevant discussion at the end of this message.

Note that the symptoms of this problem appear differently on
different VMs and different memory allocation settings. It also
did not occur until after the introduction of the event tickler
process into the image, so this appeared to be a pre-existing
bug that had been masked until the event tickler process was
added.

I'm short of time right now, but I'll take another look at this
as soon as I get some free time. Having said that, I'm no expert
on the process scheduler or memory manager, so a better informed
explanation and/or fix would be welcome.

On Thu, Mar 31, 2005 at 12:48:38AM -0800, Andreas Raab wrote:
> Hi Folks -
> 
> Trying to track a problem in Tweak I ran into the above mentioned 
> update. I won't comment on the process of how it got into the image (I 
> think you all know my opinion about eyeballing system-critical changes 
> and I am almost certain that the approval went something like "oh, what 
> possible harm could a one-line change do") but I am rather interested in 
> what problem this is trying to fix.

The low space interrupt failed to get through to the image. When running
with fixed memory allocation, the result was a VM crash.

> The comment of the change set states: "The low space watcher is 
> interrupted in the context of the wrong process when the eventTickler 
> process (or other high priority process) is running. This prevents low 
> space detection from functioning properly."
> 
> But this makes no sense whatsoever.

Poor wording and/or misunderstanding on my part. If I remember correctly,
the low space interrupt was "appearing" in the wrong process context
after the semaphore was signaled.
 
> Whatever this update is trying to address, it cannot possibly be what is 
> being claimed in the preamble. So what problem is this update trying to 
> solve?

Uninterruptable recursion followed by a hard VM crash on some VMs and
memory settings. Not a good thing for naive users who would be most
likely to make mistakes of this kind.

HTH,
Dave

-------------------------------------
Snippings from BFAV:

Background on how and when the bug first became evident:

> Hi Doug,
> 
> I have not reconstructed from old images, but my best guess is that
> the bug entered the image with the Project class>>interruptName:
> method, which is time stamped 9/5/2001. The bug was present at that
> point, but was not manifested until someone else added a high priority
> background process into the system, which just happened to be the
> otherwise blameless EventTickler process (EventSensor>>eventTickler),
> which runs all the time at lowIOPriority. This was introduced in
> update 5000 in April 2004, so I would expect that people started
> noticing the problem after that time.
> 
> The current thread in BFAV dates back to September 2003, which
> suggests that the symptoms of this problem were being seen before
> the EventTickler was introduced (or maybe I just did not follow
> the trail all the way back). So I'm not entirely sure how long
> people have been seeing symptoms of the problem. I'm reasonably
> sure (call it about 90% confidence level, gut feel) that the
> bug/fix that I posted addresses the underlying issue, although
> I would not be surprised if it turns out that there are other
> lurking buglets that might lead to similar symptoms.
> 
> Important: This one is timing-dependent, and you may see different
> symptoms depending on the VM and any memory settings you may have
> used on the command line. On my Linux system, if I force a limit
> on the amount of memory used (with "squeak -memory 10m"), I end
> up with a real VM crash, stack dump and all. If I don't limit the
> memory (which would be the normal mode of use), the image just
> becomes unresponsive when it gets into an infinite recursion, and
> cannot be interrupted. Presumably the VM is busy trying to allocate
> more memory from the OS, but does not actually crash while it's
> chewing away on this problem.
> 
> I thought that I had read an earlier report that (John's) Mac
> OS X does not exhibit the problem, apparently due to its use of
> a threaded VM that is more responsive to UI events. However, your
> description of the behavior on your OS X system sounds quite similar
> to what I see on Linux. 
> 
> RiscOS seems to behave similarly to Linux, and I don't know what
> Windows does.


How to reproduce the bug:

> Attached is a change set that I used to debug the stack overflow problem
> and confirm the fix. This only runs on a Unix VM with OSProcess loaded,
> but the overflow problem is a bit tricky to debug so I'm posting this in
> case someone wants to reproduce what I did.
> 
> Basically this just writes debug trace messages to standard output so I
> can keep track of what process is running what method in what order.
> Just some good ol' fashioned Fortran debugging, but what the heck, it
> worked.
> 
> >From the preamble:
> 
> This is what I used to debug the stack overflow problem. Load OSProcess
> first, then load this change set.
> 
> Intended for use on Unix/Linux. Run the Squeak vm with a fixed memory
> allocation (squeak -memory 30m) in order to force the out-of-memory
> condition.
> 
> Open a ProcessBrowser, then evaluate 'Smalltalk createStackOverflow'. 
> You should see messages on stdout that confirm that the runaway
> recursion keeps going even after the low space semaphore has be
> signaled.
> 
> Now apply the LowSpaceWatcherFix change set, and evaluate 'Smalltalk
> createStackOverflow'. The low space watcher should catch the runaway
> method right away.

> Newer Linux VM's grow memory dynamically, and do not start with any
> explicity memory limit.


Unix VM memory settings that affect whether or not the bug results
in a hard crash:

> There are two command-line options (with equivalent environment 
> variables) to control how memory is allocated on Unix:
> 
>    If no options are given then memory is allocated dynamically with the 
> limit set at 75% of the available virtual memory.
> 
>    If -memory N{mk} is given then memory is allocated statically; the 
> argument to the option defines a hard upper limit.
> 
>    If -mmap N{mk} is given then memory is allocated dynamically, with an 
> explicit upper limit to the amount of memory that will be allocated 
> (but the "75% of available virtual memory" limit still applies).
> 
> Ian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: StackOverflowDebugging-dtl.1.cs.gz
Type: application/x-gunzip
Size: 1564 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20050331/0d1649fd/StackOverflowDebugging-dtl.1.cs.bin


More information about the Squeak-dev mailing list