Concurrent Futures

Tue Oct 30 16:54:52 UTC 2007

On 30/10/2007, Andreas Raab <andreas.raab at gmx.de> wrote:
> Igor Stasenko wrote:
> > How would you define a boundaries of these entities in same image?
>
> It is defined implicitly by the island in which a message executes. All
> objects created by the execution of a message are part of the island the
> computation occurs in.
>
> To create an object in another island you need to artificially "move the
> computation" there. That's why islands implement the #new: message, so
> that you can create an object in another island by moving the
> computation, for example:
>
>    space := island future new: TSpace.
>
> This will create an instance of TSpace in the target island. Once we
> have created the "root object" further messages that create objects will
> be inside that island, too. For example, take this method:
>
> TSpace>>makeNewCube
>    "Create a new cube in this space"
>    cube := TCube new.
>    self addChild: cube.
>    ^cube
>
> and then:
>
>    cube := space future makeNewCube.
>
> Both, cube and space will be in the same island.
>
When i talking about boundaries, i talking about how you could prevent
a particular object from slipping through these boundaries and then
used in multiple islands by sending direct messages to it.

An islands provide an isolation layer for some computation. Its just a
tool with which we can  obtain a data results of some computation. Now
suppose that you returned a result (OrderedCollection for instance)
which is then used in other islands for further processing.
But at same time, a reference to this object are kept in original
island. Now there is a probability that we can send messages to
mutable object from different islands, and there is no protection. Or
it is?
I talking about current implementation. Is it possible such situation
to happen? And if not, what mechanisms you provide to prevent that?

> > Could you illustrate by some simple examples, or strategy which can be
> > used for using them for concurrent execution within single VM?
>
> I'm confused about your use of the term "concurrent". Earlier you wrote
> "There is a BIG difference between concurrency (parallel execution with
> shared memory) and distributed computing." which seems to imply that you
> discount all means of concurrency that do not use shared memory. If that
> is really what you mean (which is clearly different from the usual
> meaning of the term concurrent) then indeed, there is no way for it to
> be "concurrent" because there simply is no shared mutable state between
> islands.
>

Sorry if my definition was misleading or incorrect. I just see the
difference between running two islands on different images and running
them on same image in separate native threads.
Even if we suppose the we don't have shared state (to some extent)
between islands, we definitely will have shared state in same image
(on VM side at least), and we should carefully deal with it.

> > I'm very interested in practical usage of futures myself.
> > What will you do, or how you would avoid the situation , when
> > sometimes a two different islands containing a reference to the same
> > object in VM will send direct messages to it, causing racing
> > condition?
>
> The implementation of future message sending uses locks and mutexes. You
> might say "aha! so it *is* using locks and mutexes" but just as with
> automatic garbage collection (which uses pointers and pointer arithmetic
> and explicit freeing) it is simply a means to implement the higher-level
> semantics. And since no mutual/nested locks are required
> deadlock-freeness can again be proven.
>
No, i don't wanna say 'a-ha' :) The implementation details not really
matter here because they can be improved later. What is more important
is the proof of concept.
For instance, we could add an 'atomic update' primitive which could be
used for implementing a non-locking queues and/or lists.

> > Yes. But this example is significant one. Sometimes i want these
> > messages run in parallel, sometimes i don't. Even for single 'island'.
>
> In the island model, this is not an option. The unit of concurrency is
> an island, period. If want to run computations in parallel that share
> data you either make the data immutable (which can enable sharing in
> some limited cases) or you copy the needed data to "worker islands".
> Basic load balancing.

Got it.

>
> > Then, for general solution we need these islands be a very small (a
> > smaller one is a single object) or contain big number of objects. The
> > question is, how to give control of their sizes to developer. How
> > developer can define a boundaries of island within single image?
>
> By sending messages. See above.
>
> > I will not accept any solutions like 'multiple images' because this
> > drives us into distributed computing domain, which is _NOT_ concurrent
> > computing anymore, simple because its not using shared memory, and in
> > fact there is no sharing at all, only a glimpse of it.
>
> Again, you have a strange definition of the term concurrency. It does
> not (neither in general english nor in CS) require use of shared memory.
>   There are two main classes of concurrent systems, namely those relying
> on (mutable) shared memory and those relying on message passing
> (sometimes utilizing immutable shared memory for optimization purposes
> because it's indistinguishable from copying). Erlang and E (and Croquet
> as long as you use it "correctly") all fall into the latter category.
>
Correctly. Exactly!
But i'm interesting in making 'use incorrectly' impossible.

> >> This may be the outcome for an interim period. The good thing here is
> >> that you can *prove* that your program is deadlock-free simply by not
> >> using waits. And ain't that a nice property to have.
> >>
> > you mean waits like this (consider following two lines of code run in parallel):
> >
> > [ a isUnlocked ] whileFalse: [ ]. b unlock.
> >
> > and
> >
> > [ b isUnlocked] whileFalse: []. a unlock.
>
> Just like in your previous example, this code is meaningless in Croquet.
> You are assuming that a and b can be sent synchronous messages to and
> that they resolve while being in the busy-loop. As I have pointed out
> earlier this simply doesn't happen. Think of it that way: Results are
> itself communicated using future messages, e.g.,
>
> Island>>invokeMessage: aMessage
>    "Invoke the message and post the result back to the sender island"
>    result := aMessage value. "compute result of the message"
>    aMessage promise future value: result. "resolve associated promise"
>
> so you cannot possibly wait for the response to a message you just
> scheduled. It is simply not possible, neither actively nor passively.
>
> > And how could you guarantee, that any bit of code in current ST image
> > does not contain such hidden locks - like a loops or recursive loops
> > which will never return until some external entity will change the
> > state of some object(s)?
>
> No more than I can or have to guarantee that any particular bit of the
> Squeak library is free of infinite loops. All we need to guarantee is
> that we don't introduce new dependencies, which thanks to future
> messages and promises we can guarantee. So if the subsystem is deadlock
> free before it will stay so in our usage of it. If it's not then, well,
> broken code is broken code no matter how you look at it.
>
> >>> I pointed that futures as an 'automatic lock-free' approach is not
> >>> quite parallel to 'automatic memory management by GC'.
> >> The similarity is striking. Both in terms of tradeoffs (trade low-level
> >> control for better productivity) as well as the style of arguments made
> >> against it ;-) Not that I mind by the way, I find these discussions
> >> necessary.
> >>
> > The striking is, that introducing GC does good things - removing a
> > necessity to care about memory, which helps a lot in developing and
> > makes code more clear and smaller. But i can't see how futures does
> > same. There are still lot things to consider for developer even by
> > using futures.
>
> The main advantages are increased robustness and productivity. We worry
> a *lot* about deadlocks since some our usage of Croquet shows exactly
> the kind of "mixed usage" that you pointed out. But never, not once,
> have we had a deadlock or even have had to worry about it, in places
> where we used event-loop concurrency consistently. (interesting aside:
> Just today we had a very complex deadlock on one of our servers and my
> knee-jerk reaction was to try to convert it to event-loop concurrency
> because although we got stack traces we may not be able to completely
> figure out how the system ended up in that deadlock :-)
>
> We've gradually continued to move to event-loop concurrency more and
> more in many areas of our code because the knowledge that this code will
> be deadlock-free allows us to concentrate on solving the problem at hand
> instead of figuring out the most unlikely occurences that can cause
> deadlock - I suspect that I'll be faster rewriting the code from today
> as event-loops than figuring out what caused and how to avoid that deadlock.
>
> And that is in my understanding the most important part - how many hours
> have you spent thinking about how exactly a highly concurrent system
> could possibly deadlock? What if you could spend this time on improving
> the system instead, knowing that deadlock *simply cannot happen*?
>
> Cheers,
>    - Andreas
>
>

-- 
Best regards,
Igor Stasenko AKA sig.