Folks -
Just as a follow-up to this note I now have proof that we're loosing semaphore signals occasionally. What I was able to detect was that when running forums over a period of 20 hours we lost 2 out of 421355 signals. We'll have the follow-on discussion on vm-dev since I don't think most people here are interested in discussing the possibilities of how this could happen and what to do about it. Please send any follow-ups to vm-dev (and vm-dev only).
Cheers, - Andreas
Andreas Raab wrote:
John M McIntosh wrote:
Er, so given we don't have a thread safe signalSemaphoreWithIndex code base (on purpose) I wonder how many signals per second are you doing and are you perhaps overflowing the semaphoresUseBufferA/B table? Assuming you are saying you do the signalSemaphoreWithIndex() and you never see that over in the image?
I cannot prove any of this because it's so unreliable but I don't think that's the problem. An overflow like you are describing is only possible if you overflow before the VM (not the image!) gets to the next interrupt check. If that were the case (for example because we're spending too much time in some primitive like BitBlt) I believe we'd be seeing this problem more reliably than we do.
Also, the Windows VM actually replaces signalSemaphoreWithIndex with a version that *is* thread-safe in the proxy interface since this used to be an issue in the past. It is still possible to overflow the semaphores but not that you're competing between two threads when signaling (i.e., overwriting entries because threads are executing on different cores).
Perhaps most importantly, the last place where I've seen this happen was in a callback which means the signaling code was running from the main thread. There is of course a possibility something completely else goes wrong (random corruption of the semaphore index for example) but I haven't had the time to investigate this - I was more interested in finding a suitable workaround for the release ;-)
Cheers,
- Andreas