Process>>terminate woes

Andreas Raab andreas.raab at gmx.de
Tue Dec 4 22:31:05 UTC 2007


Hi -

I had an eventful (which is euphemistic for @!^# up) morning caused by 
Process>>terminate. In our last round of delay and semaphore discussions 
I had noticed that there is a possibility of having a race condition in 
Process>>terminate but dismissed it as being of an application problem 
(e.g., if you send #terminate make sure you have only one place where 
you send it).

This morning proved conclusively that this is a race condition which can 
affect *every* user of the system. It is caused by Process>>terminate 
which says:

	myList remove: self ifAbsent: [].

The reason this is so problematic is that the modification of myList is 
not atomic and that because of the non-atomic modification there is a 
possibility of the VM manipulating the very same list concurrently due 
to an external event (like a network interrupt). When this happens in 
"just the right way" the effect is that any number of processes at the 
same priority will "fall off" of the scheduled list. In the image that I 
was looking at earlier we had the following situation:
* ~40 processes were not running
* The processes had their myList be an empty linked list
* The processes were internally linked (via nextLink)
* The processes were all at the same priority
Given that most of the processes were unrelated other than having the 
same priority I think the evidence is pretty clear.

The question is now: How can we fix it? My proposal would be to simply 
change primitiveSuspend such that for a non-active process it will 
primitively take the process off its suspendingList. This makes suspend 
a little more general and (by returning the previous suspendingList) it 
will also guard us against any following cleanup (like the Semaphore 
situations earlier).

Unfortunately, this *will* require VM changes but I don't think it can 
be helped at this point since the VM will be manipulating these lists 
atomically anyway. The good news though is that we can have reasonable 
fallback code which does just exactly what we do today as a fallback to 
primitiveSuspend.

Any comments? Alternatives? Suggestions?


Cheers,
   - Andreas



More information about the Squeak-dev mailing list