[squeak-dev] Re: [Pharo-project] How about atomic value-swap bytecode?

Wed Oct 13 19:17:18 UTC 2010

On 13 October 2010 21:47, Tom Rushworth <tom_rushworth at mac.com> wrote:
> Hi Igor,
>
> I should mention as an aside before going on with the debate that there are several variations on CAS.  The one returning a bool is the one I like, but there is another very useful variant that returns the old value, regardless of whether or not it matched the old value you supplied.  In pseudo-code (since 'var' is supposed to be a variable, not a simple value):
>
> cas: &var oldVal: oldValue newVal: newValue
>   | prevValue |
>   prevValue := var.
>   (var == oldValue) ifTrue: [var := newValue]
>   ^prevValue
>
> If your interest is in the actual previous value, this is the better variant to use.  You can synthesize the boolean variant by comparing the returned prevValue to the oldValue you suppliled one the operation has completed.
>

I think this is not really important. In one case its more convenient
to receive boolean,
in other cases - old value. VM can provide just one variant. Another
one can be coded with ease.

> On 2010-Oct-13, at 10:23, Igor Stasenko wrote:
>
>> On 13 October 2010 19:41, Tom Rushworth <tom_rushworth at mac.com> wrote:
>>>
>
> [snip]
>>>
>>> I'm using "synchronization" in the sense of safe access to data in a multi-threaded context.  There is no point at all to atomic ops unless you are doing that kind of "synchronization".  In particular, your example was for implementing a lock, which I took to be a mutex style lock from looking at your code.  Your example is in fact implementing a CAS spinloop for a boolean value.  My counter example just used CAS explicitly :).  You can use an unconditional swap for boolean values because there are only two possible values, so that setting the 'locked' variable to true unconditionally doesn't hurt anything since either it was already true and you didn't change anything visible to another thread, or it was the desired previous value (false) and you did  change things visible to another thread is a planned, predictable way.  If the variable in question was a pointer this would not be a safe operation, because you have no way to predict what the value of temp should be.
>>
>> how many spin loops you seen where you need more than two values
>> (true/false) i.e. locked/unlocked ? :)
>
> Lots. How about a simple lock that uses 0 for unlocked and thread ID > 0 for locked?  This is very useful when trying to find a deadlock, because the lock tells you who owns it. A semaphore count is another (numeric value).  I'm sure there are many more.
>>
>>
> [snip]
>
>>>>>
>>>>> No temp needed, single block. well understood atomic op :).
>>>>
>>>> Yeah? Then try and implement a unconditional atomic swap using CAS.
>>>
>>> I'll look up the paper in a short while, but for now, how about:
>>>
>>>    [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.
>>>
>>
>> see.. now in own turn i can say.. hey.. i can do it w/o temp and
>> without a loop.  :)
>
> Heh - yes you can.  But I get to reply that unless you are doing it on a boolean, you are changing the value of the var in a way that makes the var very difficult to use safely for the other threads involved.  Let's look at a simple increment of a numeric value:
>
>   [ | oldVal | oldVal := var. var ?:= (oldVal + 1) : oldVal ] whileFalse.
>
> The unconditional swap can easily end up getting one increment "erased" by another.
>
yes.

> [snip]
>>
>> Ok. Here is a simple example for you. A shared LIFO queue.
>
> There's a problem here...
>
>> Populating:
>>
>> | temp |
>> temp := newItem := QueueItem new.
>> queueHead :=: newItem.
> What if somebody else processes the queue at this point and finishes with the QueueItem instance in temp? You've lost the whole old queue.

yes. my bad. :)
Its always hard to make things right for a first time.

>> temp next: newItem.
>>
>>
>> Flushing and processing the queue:
>>
>> | oldHead |
>> oldHead := nil.
>> queueHead :=: oldHead.
>>
>> [ oldHead notNil ] whileTrue: [
>>  "process the queue item here"
>>
>>  oldHead := oldHead next.
>> ]
>
> I'm not saying CAS can do the LIFO way better, but the naive "push" operation works perfectly:
>
>    | oldHead newItem |
>    newItem := QueueItem new.
>    [ oldHead := queueHead. newItem next: oldHead. queueHead ?:= newItem : oldHead ] whileFalse.
>
> Your flush of the entire queue looks ok, but could be done the same way by CAS.
>
> The really difficult problem with a shared LIFO queue is extracting a _single_ item at a time (i.e. "pop").  There is almost an entire field of literature on this subject :).  Google for "Lock free" "ABA problem".  There are CAS based solutions, but they are a lot more elaborate than I want to try to type into an email.  I don't know of any atomic swap based solutions to the "pop" operation.
>>
>>
>>>
>>>> I simply need a guarantee from VM , that swap operation will be
>>>> atomic. No less, no more.
>>>> Compare-And-Swap and swap is different operations.
>>>
>>> I guess that depends on how you look at it.  I see CAS as a simple generalization of atomic swap that is _much_ more useful for implementing many different kinds of safe multi-threaded data access.  Unconditional swap works safely for booleans, but if you use it for something more complicated, like numbers, you end up in all sorts of complications.  The unconditional swap ends up "leaking" information into the variable that is visible to other threads in a way that is very difficult to cope with for the other threads.  Try doing a thread safe numeric increment with unconditional swap....
>>
>> If you tell me how you can atomically evaluate:
>> SmallInteger maxVal+1
>>
>> i'll try :)
>
> Red herring :).  Atomic increment and decrement are used all the time for a things like reference counting, worrying about overflow of whatever the HW/VM/whatever native word size is another issue.  Note however, that the CAS version of increment has to pick up the old value and look at it in order to compute the new value, and can make a check for maxVal before computing the new value, which provides the opportunity to handle the issue however you want to.

yes. once i tried to implement double-linked list using CAS and failed..  :)

>>
>>>>
>>>>
>>>>> I've used this pattern (expressed in C code) in a very heavily used parallel data processing application, so I can even claim to have tested it extensively :).
>>>>>
>>>>
>>>> Yes yes.. But C is not Smalltalk.
>>>
>>> Very true, as Eliot pointed out one of the implications of the difference above.  However, the algorithms for safe access to data in a multi-threaded environment are completely language and hardware independent.  You simply choose the algorithm you use to match what your language and HW have available for you to implement the algorithm.  In this context, if your (our) language provide CAS, you end up with a much broader choice of algorithms.
>>>
>>> The only reason I brought up the HW CAS instruction was to make it clear that the machine code or C code for the primitive would not be any more complicated for CAS no matter what the assumptions or guarantees are for the VM scheduler.
>>
>> I'm not arguing about userfulness of CAS.
>> As i said, if you think its cool to have as well, lets think how to
>> add it as well.
>> But i'm against 'lets add CAS instead of atomic swap'.
>
> Ah.  Well, I'm afraid I am arguing against atomic swap, because it is just too weak to be really useful.  Yes, it does booleans, but that's about all.  I've been working on multi-threaded C code for about 5 years now, and the single most important thing I've learned is that lock free multi-threaded algorithms are not nearly as simple as they look at first.  The classic example is an actual published paper on a lock-free double ended queue that contained a flaw where the last element could be popped from both ends at once by two different threads.  The paper was peer-reviewed, both in it's original conference version and in its later journal version, and the flaw wasn't discovered until after the journal version was published.

Okay, i must confess, that atomic swap seems inferior to CAS.
So, lets stick with CAS.

Thanks, Tom for providing arguments and proving your point.

-- 
Best regards,
Igor Stasenko AKA sig.