Process>>terminate woes

Igor Stasenko siguctua at gmail.com
Tue Dec 4 22:54:57 UTC 2007


I vote for VM changes.
First, i think, we should consider, what makes Process (and all
connected to it) so fragile to concurrency issues, and redesign them
to simply make such things impossible to appear.
All these dirty hacks and additional guard code only shows that the
original design have flaws and best way to improve it is to change it,
instead of patching again and again and making things even more
fragile.

On 05/12/2007, Andreas Raab <andreas.raab at gmx.de> wrote:
> Hi -
>
> I had an eventful (which is euphemistic for @!^# up) morning caused by
> Process>>terminate. In our last round of delay and semaphore discussions
> I had noticed that there is a possibility of having a race condition in
> Process>>terminate but dismissed it as being of an application problem
> (e.g., if you send #terminate make sure you have only one place where
> you send it).
>
> This morning proved conclusively that this is a race condition which can
> affect *every* user of the system. It is caused by Process>>terminate
> which says:
>
>         myList remove: self ifAbsent: [].
>
> The reason this is so problematic is that the modification of myList is
> not atomic and that because of the non-atomic modification there is a
> possibility of the VM manipulating the very same list concurrently due
> to an external event (like a network interrupt). When this happens in
> "just the right way" the effect is that any number of processes at the
> same priority will "fall off" of the scheduled list. In the image that I
> was looking at earlier we had the following situation:
> * ~40 processes were not running
> * The processes had their myList be an empty linked list
> * The processes were internally linked (via nextLink)
> * The processes were all at the same priority
> Given that most of the processes were unrelated other than having the
> same priority I think the evidence is pretty clear.
>
> The question is now: How can we fix it? My proposal would be to simply
> change primitiveSuspend such that for a non-active process it will
> primitively take the process off its suspendingList. This makes suspend
> a little more general and (by returning the previous suspendingList) it
> will also guard us against any following cleanup (like the Semaphore
> situations earlier).
>
> Unfortunately, this *will* require VM changes but I don't think it can
> be helped at this point since the VM will be manipulating these lists
> atomically anyway. The good news though is that we can have reasonable
> fallback code which does just exactly what we do today as a fallback to
> primitiveSuspend.
>
> Any comments? Alternatives? Suggestions?
>
>
> Cheers,
>    - Andreas
>
>


-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list