Concurrent Futures

Tue Oct 30 04:31:04 UTC 2007

Igor Stasenko wrote:
> How would you define a boundaries of these entities in same image?

It is defined implicitly by the island in which a message executes. All 
objects created by the execution of a message are part of the island the 
computation occurs in.

To create an object in another island you need to artificially "move the 
computation" there. That's why islands implement the #new: message, so 
that you can create an object in another island by moving the 
computation, for example:

   space := island future new: TSpace.

This will create an instance of TSpace in the target island. Once we 
have created the "root object" further messages that create objects will 
be inside that island, too. For example, take this method:

TSpace>>makeNewCube
   "Create a new cube in this space"
   cube := TCube new.
   self addChild: cube.
   ^cube

and then:

   cube := space future makeNewCube.

Both, cube and space will be in the same island.

> Could you illustrate by some simple examples, or strategy which can be
> used for using them for concurrent execution within single VM?

I'm confused about your use of the term "concurrent". Earlier you wrote 
"There is a BIG difference between concurrency (parallel execution with 
shared memory) and distributed computing." which seems to imply that you 
discount all means of concurrency that do not use shared memory. If that 
is really what you mean (which is clearly different from the usual 
meaning of the term concurrent) then indeed, there is no way for it to 
be "concurrent" because there simply is no shared mutable state between 
islands.

> I'm very interested in practical usage of futures myself.
> What will you do, or how you would avoid the situation , when
> sometimes a two different islands containing a reference to the same
> object in VM will send direct messages to it, causing racing
> condition?

The implementation of future message sending uses locks and mutexes. You 
might say "aha! so it *is* using locks and mutexes" but just as with 
automatic garbage collection (which uses pointers and pointer arithmetic 
and explicit freeing) it is simply a means to implement the higher-level 
semantics. And since no mutual/nested locks are required 
deadlock-freeness can again be proven.

> Yes. But this example is significant one. Sometimes i want these
> messages run in parallel, sometimes i don't. Even for single 'island'.

In the island model, this is not an option. The unit of concurrency is 
an island, period. If want to run computations in parallel that share 
data you either make the data immutable (which can enable sharing in 
some limited cases) or you copy the needed data to "worker islands". 
Basic load balancing.

> Then, for general solution we need these islands be a very small (a
> smaller one is a single object) or contain big number of objects. The
> question is, how to give control of their sizes to developer. How
> developer can define a boundaries of island within single image?

By sending messages. See above.

> I will not accept any solutions like 'multiple images' because this
> drives us into distributed computing domain, which is _NOT_ concurrent
> computing anymore, simple because its not using shared memory, and in
> fact there is no sharing at all, only a glimpse of it.

Again, you have a strange definition of the term concurrency. It does 
not (neither in general english nor in CS) require use of shared memory. 
  There are two main classes of concurrent systems, namely those relying 
on (mutable) shared memory and those relying on message passing 
(sometimes utilizing immutable shared memory for optimization purposes 
because it's indistinguishable from copying). Erlang and E (and Croquet 
as long as you use it "correctly") all fall into the latter category.

>> This may be the outcome for an interim period. The good thing here is
>> that you can *prove* that your program is deadlock-free simply by not
>> using waits. And ain't that a nice property to have.
>>
> you mean waits like this (consider following two lines of code run in parallel):
> 
> [ a isUnlocked ] whileFalse: [ ]. b unlock.
> 
> and
> 
> [ b isUnlocked] whileFalse: []. a unlock.

Just like in your previous example, this code is meaningless in Croquet. 
You are assuming that a and b can be sent synchronous messages to and 
that they resolve while being in the busy-loop. As I have pointed out 
earlier this simply doesn't happen. Think of it that way: Results are 
itself communicated using future messages, e.g.,

Island>>invokeMessage: aMessage
   "Invoke the message and post the result back to the sender island"
   result := aMessage value. "compute result of the message"
   aMessage promise future value: result. "resolve associated promise"

so you cannot possibly wait for the response to a message you just 
scheduled. It is simply not possible, neither actively nor passively.

> And how could you guarantee, that any bit of code in current ST image
> does not contain such hidden locks - like a loops or recursive loops
> which will never return until some external entity will change the
> state of some object(s)?

No more than I can or have to guarantee that any particular bit of the 
Squeak library is free of infinite loops. All we need to guarantee is 
that we don't introduce new dependencies, which thanks to future 
messages and promises we can guarantee. So if the subsystem is deadlock 
free before it will stay so in our usage of it. If it's not then, well, 
broken code is broken code no matter how you look at it.

>>> I pointed that futures as an 'automatic lock-free' approach is not
>>> quite parallel to 'automatic memory management by GC'.
>> The similarity is striking. Both in terms of tradeoffs (trade low-level
>> control for better productivity) as well as the style of arguments made
>> against it ;-) Not that I mind by the way, I find these discussions
>> necessary.
>>
> The striking is, that introducing GC does good things - removing a
> necessity to care about memory, which helps a lot in developing and
> makes code more clear and smaller. But i can't see how futures does
> same. There are still lot things to consider for developer even by
> using futures.

The main advantages are increased robustness and productivity. We worry 
a *lot* about deadlocks since some our usage of Croquet shows exactly 
the kind of "mixed usage" that you pointed out. But never, not once, 
have we had a deadlock or even have had to worry about it, in places 
where we used event-loop concurrency consistently. (interesting aside: 
Just today we had a very complex deadlock on one of our servers and my 
knee-jerk reaction was to try to convert it to event-loop concurrency 
because although we got stack traces we may not be able to completely 
figure out how the system ended up in that deadlock :-)

We've gradually continued to move to event-loop concurrency more and 
more in many areas of our code because the knowledge that this code will 
be deadlock-free allows us to concentrate on solving the problem at hand 
instead of figuring out the most unlikely occurences that can cause 
deadlock - I suspect that I'll be faster rewriting the code from today 
as event-loops than figuring out what caused and how to avoid that deadlock.

And that is in my understanding the most important part - how many hours 
have you spent thinking about how exactly a highly concurrent system 
could possibly deadlock? What if you could spend this time on improving 
the system instead, knowing that deadlock *simply cannot happen*?

Cheers,
   - Andreas