Low-space signals in production environments

Sun Feb 11 20:42:27 UTC 2007

On 11-Feb-07, at 12:15 PM, Andreas Raab wrote:

> David T. Lewis wrote:
>> On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote:
>>> In particular considering that the lowspace semaphore can't  
>>> really do anything because it doesn't even know which process got  
>>> interrupted!
>> Does your image have the fix from Mantis 1041?
>
> No, but that doesn't really matter. My point was that a low- 
> priority process has no chance to ever interrupt a higher-priority  
> process. And I doubt your fix changes that.

No, it simply does a somewhat better job of guessing which process  
might be the problem.

We could, as I'm pretty sure we have discussed, find some way to  
include the oop of the process that caused the allocation problem in  
the semaphore more directly, which would improve things a touch more  
by avoiding the possibility of race conditions. The real problem with  
identifying the *actually* problematic process is that the allocation  
request that triggers a lowspace may well not be part of the actual  
space hog. Suspending the wrong process and letting others -  
including maybe the monster - simply leads to more trouble.

If the lowspace handler suspended all other processes it would  
obviate some of the problems. If we wanted to interact with users as  
part of the handler we might have to permit some other process to  
start or resume, perhaps under some restrictions. If simply doing a  
gc solved the space problem then we could simply allow everything  
else to resume. And we should remove direct in-vm calls to gc  
wherever possible so that in-image code can apply more flexible  
policies.

Having a very high priority process to handle low space conditions  
seems like a plausible idea to me.

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
A bug in the code is worth two in the documentation.