[squeak-dev] Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth tom_rushworth at mac.com
Wed Oct 13 16:41:23 UTC 2010


Hi Igor, Eliot,

Eliot, I'm not sure I understood your comment:

On 2010-Oct-13, at 08:55, Eliot Miranda wrote:

> Right.  And because Smalltak can't reify variables and CAS is an operation on a variable CAS can't be implemented as a primitive on
> variables.  There's no way to express "pass a variable to a primitive", only "pass an expression (which may be the value of a variable)" to a
> primitive".  One could do it with a primitive on an object, e.g. thisContext at: tempIndex compareWith: match andSetTo: expr, or anObject
> instVarAt: varIndex compareWith: match andSetTo: expr but that's soooo ugly.  Hence I think it is better done using a special assignment
> operator.

Doesn't this apply to Igor's proposed atomic swap as well?  The target in Igor's example was a variable "locked" which was clearly meant to be visible to other threads.

If that's the case, then they both need to be implemented with a special assignment operator, and Igor and I can get back to arguing the merits of the atomic swap versus the atomic compare and swap :).  If not, then Igor's swap has a clear advantage and I guess you can both ignore the rest of this email...

On 2010-Oct-13, at 08:33, Igor Stasenko wrote:

> On 13 October 2010 18:07, Tom Rushworth <tom_rushworth at mac.com> wrote:
>> Hi all,
>> 
>> Take a look through the literature for the "strength" of the various types of atomic operations with respect to doing various synchronization tasks.  CAS is "universal" in the sense that it can simulate all of the others.  (The simulation is wretched, O(N**3), so it isn't practical, but...)
>> I doubt that your atomic swap is as strong.
> 
> Who said i gonna use it for synchronizations tasks?
> We got a plenty of other stuff in smalltalk for synchronization.

I'm using "synchronization" in the sense of safe access to data in a multi-threaded context.  There is no point at all to atomic ops unless you are doing that kind of "synchronization".  In particular, your example was for implementing a lock, which I took to be a mutex style lock from looking at your code.  Your example is in fact implementing a CAS spinloop for a boolean value.  My counter example just used CAS explicitly :).  You can use an unconditional swap for boolean values because there are only two possible values, so that setting the 'locked' variable to true unconditionally doesn't hurt anything since either it was already true and you didn't change anything visible to another thread, or it was the desired previous value (false) and you did  change things visible to another thread is a planned, predictable way.  If the variable in question was a pointer this would not be a safe operation, because you have no way to predict what the value of temp should be.
> 
>> 
>> Pretty well every current HW architecture actually has the CAS instruction (or some minor variation) implemented as a single instruction, so making it a primitive should be relatively easy. It seems a terrible waste to pick a weak synchronization primitive when a much better one could be had for very little more work.
> 
> I feel that you are mixing hardware and language-side atomicity.
> I don't care about hardware , i care that at language side i have a
> simple atomic operation, which can't be interrupted by scheduler.
> Hardware CAS is completely irrelevant to this, since VM can support
> atomicity for language side without any specific requirements from the
> hardware it runs on.
> 
> 
>> 
>> On 2010-Oct-12, at 12:04, Igor Stasenko wrote:
>> 
>>> CAS? Perhaps. If you wish to add it too, no problem.
>>> 
>>> I want at minimum swap, because with it i can easily implement
>>> lock-free waiting:
>>> 
>>> [ | temp |
>>>  [ locked ] whileTrue.
>>>    temp := true.
>>>    locked :=: temp.
>>>    temp
>>> ] whileTrue.
>> 
>> I think this is simpler (using Eliot's notation):
>> 
>>  [locked ?:= true : false] whileFalse.
>> 
>>   ... locked here ...
>> 
>>   locked := false.
>> 
>> No temp needed, single block. well understood atomic op :).
> 
> Yeah? Then try and implement a unconditional atomic swap using CAS.

I'll look up the paper in a short while, but for now, how about:

    [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.

Yes, this makes the retry loop explicit, but the retry loop exists in _any_ atomic op at some level.  It may be buried in the HW where the microcode does the retry if there is memory cache line contention or some such but it is always there.  "Atomic" implies blocking other threads, so that the other (blocked) threads have to retry.  In the case of the atomic op being a VM level primitive the retry loop is implicit in the scheduler, in the sense that the scheduler cannot schedule other threads for the duration of the primitive.
> 
> Again, who said, that i going to use atomic swap for loops like above?

Um, your example said it? :)  Sorry if I misunderstood, but email doesn't give much context, and the loop you gave was the only context I had, so I assumed that was what you wanted to do.

> I simply need a guarantee from VM , that swap operation will be
> atomic. No less, no more.
> Compare-And-Swap and swap is different operations.

I guess that depends on how you look at it.  I see CAS as a simple generalization of atomic swap that is _much_ more useful for implementing many different kinds of safe multi-threaded data access.  Unconditional swap works safely for booleans, but if you use it for something more complicated, like numbers, you end up in all sorts of complications.  The unconditional swap ends up "leaking" information into the variable that is visible to other threads in a way that is very difficult to cope with for the other threads.  Try doing a thread safe numeric increment with unconditional swap....
> 
> 
>> I've used this pattern (expressed in C code) in a very heavily used parallel data processing application, so I can even claim to have tested it extensively :).
>> 
> 
> Yes yes.. But C is not Smalltalk.

Very true, as Eliot pointed out one of the implications of the difference above.  However, the algorithms for safe access to data in a multi-threaded environment are completely language and hardware independent.  You simply choose the algorithm you use to match what your language and HW have available for you to implement the algorithm.  In this context, if your (our) language provide CAS, you end up with a much broader choice of algorithms.

The only reason I brought up the HW CAS instruction was to make it clear that the machine code or C code for the primitive would not be any more complicated for CAS no matter what the assumptions or guarantees are for the VM scheduler.
> 
>>> 
> 
> -- 
> Best regards,
> Igor Stasenko AKA sig.
> 


Regards,

--
Tom Rushworth






More information about the Squeak-dev mailing list