[Vm-dev] Re: Socket clock rollover issues

Wed May 6 00:52:04 UTC 2009

Folks -

Just as a follow-up to this note I now have proof that we're loosing 
semaphore signals occasionally. What I was able to detect was that when 
running forums over a period of 20 hours we lost 2 out of 421355 
signals. We'll have the follow-on discussion on vm-dev since I don't 
think most people here are interested in discussing the possibilities of 
how this could happen and what to do about it. Please send any 
follow-ups to vm-dev (and vm-dev only).

Cheers,
   - Andreas

Andreas Raab wrote:
> John M McIntosh wrote:
>> Er, so given we don't have a thread safe signalSemaphoreWithIndex code 
>> base (on purpose) I wonder how many signals per second are you doing 
>> and are you perhaps
>> overflowing the semaphoresUseBufferA/B table? Assuming you are saying 
>> you do the signalSemaphoreWithIndex() and you never see that over in 
>> the image?
> 
> I cannot prove any of this because it's so unreliable but I don't think 
> that's the problem. An overflow like you are describing is only possible 
> if you overflow before the VM (not the image!) gets to the next 
> interrupt check. If that were the case (for example because we're 
> spending too much time in some primitive like BitBlt) I believe we'd be 
> seeing this problem more reliably than we do.
> 
> Also, the Windows VM actually replaces signalSemaphoreWithIndex with a 
> version that *is* thread-safe in the proxy interface since this used to 
> be an issue in the past. It is still possible to overflow the semaphores 
> but not that you're competing between two threads when signaling (i.e., 
> overwriting entries because threads are executing on different cores).
> 
> Perhaps most importantly, the last place where I've seen this happen was 
> in a callback which means the signaling code was running from the main 
> thread. There is of course a possibility something completely else goes 
> wrong (random corruption of the semaphore index for example) but I 
> haven't had the time to investigate this - I was more interested in 
> finding a suitable workaround for the release ;-)
> 
> Cheers,
>   - Andreas
> 
>