Concurrent Futures

Tue Oct 30 23:24:59 UTC 2007

I would like to mention some of my previous work in this area:

- tinySelf 1 (1996)
http://www.lsi.usp.br/~jecel/tiny.html#rel1

This was a Self interepreter written in Self which implemented the one
thread per object model. All messages were future messages but since
sending a message to an unresolved future would block, you would have
deadlock on any recursion (direct or indirect). This problem was solved
by detecting the cycles and preempting the blocked mesasge with the one
it depends on. This results in interleaved execution, but since the
semantics are exactly the same as in a sequential execution of the
recursive code any bugs that appear won't be due to concurrency.

I was able to test simple expressions and was very happy with how much
parallelism I was able to extract from seemingly sequential code, but I
made the mistake of introducing a significant optimization (tail send
elimination) that made debugging so much harder that I was unable to
finish in the two weeks that I was able to dedicate to this project.

- 64 node Smalltalk machine (1992)
http://www.lsi.usp.br/~jecel/ms8702.html

The most interesting result in this project was the notion that most
objects in the system are immutable at any given time and that a
security system might be used to detect this. For example, just because
you can edit some font today doesn't mean that you will do it. And if
you and everyone currently logged on the local system only have read
permission for that font then it is effectively immutable. Only when the
font's owner logs in is this assumption invalid.

The advantage of knowing that an object is immutable is that you can
replicate it and you can allow multiple threads to access it at the same
time.

The only paper in English from this project describes how adaptive
compilation could be used to trim away excessive concurrency by
transforming future message passing into sequential message passing (the
semantics allow this) and then inlining them away. So if a machine has
64 processors and the application initially starts out with 10 thousand
threads, the compiler will eventually change this into code with 200 or
so threads (some are blocked at any given instant, so going down to 64
threads would not be good)..
http://www.lsi.usp.br/~jecel/jabs1.html

- operating system in an Objective-C like language (1988)
http://www.lsi.usp.br/~jecel/atos.html (this page has download links but
the text still hasn't been written)

This operating system for 286 machine used the virtual memory of that
hardware to isolate groups of objects, with one thread per group. This
would be similar to the vat/island model. All messages were sent in
exactly the same way and if the receiver was a local object then it was
just a fancy subroutine call but for remote objects you got a "segment
not present" fault and the message was packed up and sent to the other
task (possibly over the network). All messages were synchronous since I
was not aware of futures at that time.

-- current model --

I moved back to the one thread per object group model since I feel that
makes it easier for programmers to control things without having to
worry to much about details most of the time. Since my target is
children this is particularly important. An alternative that I
experimented with was having a separation between active and passive
objects. A passive object could be known only to a single active one,
but it is just too hard to program without ever accidentally letting
references to passive objects "leak". With the group/vat/island model
there is just one kind of object and things are simpler for the
programmer (but more complicated for the implementor). I have a
limitation that you can only create new objects in your own group or in
an entirely new group - I think forcing some other random group to
create an object for you is rude, though of course you can always ask
for an object there to please do it.

Some of the loaded groups are read/write but many are read-only. The
latter don't actually have their own threads but instead their code
executes in the thread of the calling group. I have hardware support for
this.

Speaking of hardware, I would like to stress how fantastically slow
(relatively speaking) main memory is these days. If I have a good
network connecting processor cores in a single chip then I can probably
send a message from one to another, get a reply, send a second message
and get another reply in the time that it takes to read a byte from
external RAM. So we should start thinking of DDR SDRAM as a really fast
disk to swap objects to/from and not as a shared memory. We should start
to take message passing seriously.

-- Jecel