[Vm-dev] Interrupted system call?

Andreas Raab andreas.raab at gmx.de
Wed Feb 11 04:11:11 UTC 2009


Hi Ian -

There is obviously a lot I don't understand about interrupt handling on 
Unix since your description (and the stuff that I found looking for 
PC-losering problem) don't make much sense to me ;-)

If I understand you correctly, then the program is in system call, then 
an interrupt happens and for some unexplicable reason that means the OS 
has to back out of the system call. Why is that? Wouldn't it be more 
sensible to just delay delivering the interrupt up to the point where 
the syscall returns? Yes, it doesn't guarantee real-time response but 
then there is probably more than one process running at any given time 
anyway so I wouldn't expect interrupts to be delivered real-time to user 
land anyway. And I *really* can't fathom the thought that any interrupt 
that happens for a process within a syscall somehow auto-magically leads 
to the kernel to forgetting the state associated with the call ;-)

But regardless of the above, I guess the point here is that this is 
really a buggy library if it doesn't wrap each and every syscall into 
such a test, no? Is the Unix VM generally doing this? Are there 
mitigating factors where you can be pretty sure it won't happen or 
particularly bad things (for example having syscalls that take several 
milliseconds with an itimer interrupt set to 1ms resolution or so?). Do 
you know if heavy network activity affects this behavior?

Thanks for all the info!

Cheers,
   - Andreas

Ian Piumarta wrote:
> Hi Andreas,
> 
> On Feb 10, 2009, at 5:48 PM, Andreas Raab wrote:
> 
>> I've seen references to that particular error in the Unix VM in other 
>> discussions. We have this problem right now in a slightly different 
>> context (an external library call reports that error) and I am 
>> wondering if someone can explain to me what causes this error
> 
> The process is in a kernel system call when a lack of resource in 
> blocking i/o (or a high-priority asychronous event) causes the process 
> to be suspended.  A decent OS would save the process state reflecting 
> its being halfway through the syscall such that at resumption it would 
> continue the syscall, transparently to the user.  Saving this state, in 
> kernel mode and intermediate between two valid user-mode states, is very 
> hard.  (Think about what would be needed if an asychonous signal arrived 
> for a process suspended halfway through a syscall, for example.)  Unix, 
> being particularly pragmatic but not particularly decent, choses instead 
> to abort the syscall but to act as if it was completed (the user process 
> resumes after the point of the call, not at it) but with a failure code 
> (EINTR) to indicate the call was aborted.  The caller (the user's 
> program) is expected to deal with this by restarting the syscall 
> explicitly, with (presumably) identical arguments.
> 
> Google for "pc losering problem" (with the quotes) if you need more on 
> this.
> 
>> and whether there is a way to "fix" the Unix VM not to cause it.
> 
> You might be able to drastically reduce the number of asychronous 
> signals (and hence the likelihood of an interrupted syscall) by counting 
> milliseconds with gettimeofday() instead of with periodic timer 
> interrupts.  '-notimer' (SQUEAK_NOTIMER) is the option (environment 
> variable), IIRC.
> 
> The pragmatically correct way to deal with this is to wrap each and 
> every syscall in a Unix program in sometime like this:
> 
> while (EINTR == (err= syscall(whatever, ...)));
> if (err) { deal with it }
> 
> The philosophically correct way to deal with it is to use an OS that 
> isn't Unix.
> 
>> Thanks for any insights!
> 
> I'm not sure that the above was insightful, but I hope it was explanatory.
> 
> Cheers,
> Ian
> 


More information about the Vm-dev mailing list