[Vm-dev] Interrupted system call?
Andreas Raab
andreas.raab at gmx.de
Wed Feb 11 04:11:11 UTC 2009
Hi Ian -
There is obviously a lot I don't understand about interrupt handling on
Unix since your description (and the stuff that I found looking for
PC-losering problem) don't make much sense to me ;-)
If I understand you correctly, then the program is in system call, then
an interrupt happens and for some unexplicable reason that means the OS
has to back out of the system call. Why is that? Wouldn't it be more
sensible to just delay delivering the interrupt up to the point where
the syscall returns? Yes, it doesn't guarantee real-time response but
then there is probably more than one process running at any given time
anyway so I wouldn't expect interrupts to be delivered real-time to user
land anyway. And I *really* can't fathom the thought that any interrupt
that happens for a process within a syscall somehow auto-magically leads
to the kernel to forgetting the state associated with the call ;-)
But regardless of the above, I guess the point here is that this is
really a buggy library if it doesn't wrap each and every syscall into
such a test, no? Is the Unix VM generally doing this? Are there
mitigating factors where you can be pretty sure it won't happen or
particularly bad things (for example having syscalls that take several
milliseconds with an itimer interrupt set to 1ms resolution or so?). Do
you know if heavy network activity affects this behavior?
Thanks for all the info!
Cheers,
- Andreas
Ian Piumarta wrote:
> Hi Andreas,
>
> On Feb 10, 2009, at 5:48 PM, Andreas Raab wrote:
>
>> I've seen references to that particular error in the Unix VM in other
>> discussions. We have this problem right now in a slightly different
>> context (an external library call reports that error) and I am
>> wondering if someone can explain to me what causes this error
>
> The process is in a kernel system call when a lack of resource in
> blocking i/o (or a high-priority asychronous event) causes the process
> to be suspended. A decent OS would save the process state reflecting
> its being halfway through the syscall such that at resumption it would
> continue the syscall, transparently to the user. Saving this state, in
> kernel mode and intermediate between two valid user-mode states, is very
> hard. (Think about what would be needed if an asychonous signal arrived
> for a process suspended halfway through a syscall, for example.) Unix,
> being particularly pragmatic but not particularly decent, choses instead
> to abort the syscall but to act as if it was completed (the user process
> resumes after the point of the call, not at it) but with a failure code
> (EINTR) to indicate the call was aborted. The caller (the user's
> program) is expected to deal with this by restarting the syscall
> explicitly, with (presumably) identical arguments.
>
> Google for "pc losering problem" (with the quotes) if you need more on
> this.
>
>> and whether there is a way to "fix" the Unix VM not to cause it.
>
> You might be able to drastically reduce the number of asychronous
> signals (and hence the likelihood of an interrupted syscall) by counting
> milliseconds with gettimeofday() instead of with periodic timer
> interrupts. '-notimer' (SQUEAK_NOTIMER) is the option (environment
> variable), IIRC.
>
> The pragmatically correct way to deal with this is to wrap each and
> every syscall in a Unix program in sometime like this:
>
> while (EINTR == (err= syscall(whatever, ...)));
> if (err) { deal with it }
>
> The philosophically correct way to deal with it is to use an OS that
> isn't Unix.
>
>> Thanks for any insights!
>
> I'm not sure that the above was insightful, but I hope it was explanatory.
>
> Cheers,
> Ian
>
More information about the Vm-dev
mailing list