Time to think about parallel Smalltalk stuff

Wed Jan 19 15:18:27 UTC 2005

Matej Kosik wrote on Wed, 19 Jan 2005 15:38:57 +0100
> I find the idea of the computation based on asynchronous messages also 
> appealing. I would like to ask if objects in your system (TinySelf) were 
> able to change their state or they were immutable.

Note that these were not really asynchronous messages, but rather
synchronous ones where using the reply was decoupled from sending the
message. You don't get the kinds of programming errors with this that
you do with true asynchronous messages because we keep the semantics as
close to the original as possible (the goal is to run unmodified
Smalltalk/Self applications). See "Process Model" in
http://www.lsi.usp.br/~jecel/selfdiff.html

TinySelf 1 objects were able to change state, but defining exactly what
"state" is has been the major problem I have faced so far. Each object
had its own thread and the state that was protected was only its local
"instance variables". That is not good enough (what about instance
variables in the objects it points to or that are passed as arguments?)
and at the same time it is too much (recursive methods will cause
deadlocks).

The Eiffel system I got some of these ideas from solved the first
problem by locking the receiver and all arguments before executing a
method, and also by having non active objects that can't be shared among
the active ones. They ignored the second problem (since you were
supposed to program active objects differently from the normal style:
don't use recursion) and I solved it by detecting deadlocks and allowing
the new method to interrupt the old, blocked one. The idea is that this
hack wouldn't cause any errors that wouldn't also be present in a
sequential execution of the code, so this was acceptable.

My current idea is to divide the objects into "groups" and have one
thread per group. Then we can define an object's "state" to be the value
of the instance variables of every object in that group. This is a lot
like vats in E. I no longer think that migration among groups is
important, so when you create an object you can use #newLocal to have it
in the same group as the sender, #newRemote to have it be the first
object in an entirely new group or just #new if you don't care either
way (different kinds of objects will have this default to one of the
previous two).

Michael Latta wrote:
> If we approach collections using future proxies we can get a lot of 
> concurrency.  For example collect/select/detect would result in a proxy 
> that binds the input collection to the operator (block).  This use of 
> proxies will also require copy on modify semantics or a radical head 
> rewiring for the programmer.

It is certainly worth looking at what people have already done in this
area. There were two interesting projects, both called
ConcurrentSmalltalk:

http://www.mt.cs.keio.ac.jp/groups.japanese/oops/cst.html
http://www.worldscibooks.com/compsci/1016.html

This has a lot in common with E and with the various Actor languages and
is very worth studying for its mailboxes and secretaries and the ^^
"return result but continue" trick.

http://cva.stanford.edu/j-machine/cva_j_machine.html
ftp://cva.stanford.edu/pub/publications/jm_retro.pdf

This is the place to look if you want to see Smalltalk (with Lisp
syntax... nobody's perfect) running on a 1024 processor machine. I don't
see any papers specifically about "concurrent aggregates", which are
what you are talking about, but they are a major factor in getting such
a high level of parallelism from normal looking code. The idea is that
when you create some ConcurrentArray with 4096 elements, four are stored
in each node. When you send #collect: to this array, each processor gets
a copy of the message and only has to execute the block for its 4 local
elements. The result is a new ConcurrentArray.

It is interesting to contrast this with Croquet. In that case each node
would have a TArray with all 4094 elements and when receiving the
#collect: message there would be no speed up at all (which isn't the
idea, after all). But both have this in common: they allow an object to
be addressed from any node as if it were local, even though it is made
up from pieces present in all nodes.

A common framework that would allow Croquet, concurrent aggregates and
other stuff (remember LindaTalk?) would be very nice to have.

-- Jecel