Some thoughts about native multithreading in smalltalk

Tue Jun 26 14:52:09 UTC 2007

Once again i asked myself, is it possible to run single squeak image
using multiple native threads without a need of excessive locking
anything or deeply rewriting a smalltalk code base.. and came to
following ideas (please, don't blame me if this was already discussed
and proven to be unusable... i hope this is original idea, right from
my mind :)

All my ideas below is around simple isolation strategy: only single
native thread can write to  object's memory at some point of time.
Basically, this means that all primitive operations on object, which
somehow altering its memory, like #basicAt:put:, or assigning ivar
must be synchronized using single thread.
There may be another issues, preventing from using multiple threads,
but this one is most fundamental i think.

Lets review the situation:
suppose we having 1000 native threads performing implementation cycle,
and now one of these threads came to point where it needs to access
object's memory.

Common approach would be to setup a write barrier for given memory
region, where object's state stored  (using mutex/semaphore or
critical section) and any thread which willing to access there must
take ownership on that particular barrier.
Things seem pretty straightforward, but thinking more in detail,
you'll see that this leads to nightmare - it will require to rewrite a
code base, putting locks here and there.. and in result you'll have a
common situation when 1 thread running while others waiting.

Another way is based on message passing and what we have here:
- any number of threads can run simultaneously under condition that
there's no two threads which currently interpreting a message with
same receiver. This condition is enough for making object's memory
consistent. Since all methods can directly access only receiver's
memory.

In detail this means, that each thread having an unique receiver, and
each object having a slot, which pointing to thread, which is
currently interpreting a message with given object as receiver.
To check, if we can send message to object without risk of getting
into race condition we need to check object's thread id, and if it's
nil, then we can safely send a message to it, by assigning object's
thread id slot to current thread id, and entering method.

Now, what will happen, when other thread needs to send message to
object, which is already used by other thread as receiver?

Things get complex when we using a stack for message passing, where we
need to push/pop arguments e.t.c, using stack frames and all other
things dependent on that.
But what if we use a message queues instead of stack?

The state of message evaluation can be represented by following
structure (method context):
- receiver
- arguments
- method
- thread id (responsible for evaluating a message)
- completion flag (indicating that message evaluation is complete)
- message result
- 'next queue object' (indicating that this message context are
waiting for complete evaluation of other message)
- bytecode position
- space for temps

Now,  interpretation cycle will look like:
A particular thread entering a method by receiving structure above and
starting implementing method bytecodes.
Each time, it needs to send message to some object, it must create a
new structure, fill it with proper values and append to message queue
of thread, which will be responsible for evaluation of sending
message.
After appending a message to queue it must wait for it's completion isn't? No!
Yes, it stops interpretation of current method, but instead of waiting
for complete evaluation of message required to continue evaluating
current method, thread checking its message queue to see, if there's
more messages awaiting for evaluation (having 'next queue object' =
nil), or if it is not nil, having complete flag set.
When finished evaluating a message, thread stores the message result
and sets completion flag to true. Then starting listening queue as
always.
When thread chooses to continue evaluation of method context which was
waiting for result of other context, it reads a result, then removes
completed method context from queue.

Now, since each object having a slot, indicating where it currently
used as receiver we can simply decide to which thread's queue we must
add a method context for evaluation of message send to prevent race
condition. If thread A evaluating some message and needs to send
message to object B, which is currently involved in message evaluation
as receiver in thread C, we simply appending new method context to
thread C queue and waiting for it's completion.

A message queuing described above can be combined with technique,
known as 'future refs' - we can continue evaluating a current message
by not waiting message send result, but using a temporary 'future
object' which is actually our method context described above.
We can continue up to the point, where result of an evaluation needs
to be stored in other object's memory or used in branching, any
additional message sends to 'future object' can be simply queued in
same thread which currently evaluating the future object.

Ok, seems I described a rough view of idea. Awaiting for your comments!

Also, consider that English is not my native language :)

Regards,
  sig