Lots of concurrency

Sun Oct 28 01:39:50 UTC 2001

[ a near-rant follows ]

At 10:01 AM 10/25/2001 -0700, Ken Kahn wrote:
>Thinking sequentially about problems that are inherently concurrent is
>suboptimal.

I disagree.  My favorite examples, not necessarily relevant
to Squeak, are:

1) network (web) servers.

The task is inherently parallel; there can be many
connections in progress at one time.  However, the
task is greatly simplified by organizing it as a
bunch of sequential threads.

2) finite element models (game of life, cellular automata).

The problem is *described* from the point-of-view of a single
cell, and the parallelism is then added.  The differential
equation itself describes the behavior *of a point*.  There
are cases where it is more efficient to think of the aggregate
acting in parallel (for instance, the physical properties of
an I-beam) but often the most interesting problems are those
where we *do not know the aggregate behavior* -- there, we
must work from first principles.  (It depends, very much,
one what we are using the computer/program for -- designing
a house, we work with standard aggregate-behavior-understood
materials, but designing a scramjet or a high-performance
propeller, we must work from finite elements.)

And, if you consider how these are usually programmed, the
time step and grid size are manipulated so that they really
are very sequential and step-at-a-time.  Just for example,
the time step is chosen small enough (usually) so that the
rate-of-change in the neighbors can be ignored (and in the
cases with which I am vaguely familiar, when the rate of
change in neighbors is considered, the time step is still
chosen to be small enough that the "2nd derivative" is zero).
One of the problems I recall (part of Thom Potempa's
dissertation, I think) was dealing with cases where the
change was inherently discontinuous -- in the case of
oil reservoir steam injection, boiling and condensation
of water at critical pressure/temperature combinations.

The use of (synchronized) time steps also helps avoid the
non-uniform progress problem (which is the one that most
often catches me when I write parallel code).

In particular, I suspect that T.I.M. uses tiny time steps,
and treats each object as a sequential process.  It may
not be coded that way (for efficiency) but I am almost
certain that is how it was modeled.

In contrast, working on parallel programs, I have seen/heard of
programmers across a wide range of skill levels (everyone from
the more academic members of the Modula-3 design team, to a
tenured professor at MIT, to Los Alamos rocket scientists, to
just ordinary programmers)  get wedged numerous times with
mistaken assumptions about atomicity, progress of computing,
progress of communication, and equivalence of different
concurrent programs.  Things like:

  WHILE boolean_flag DO (* busy wait *) ;

which may never, ever see the value of boolean_flag (there
are efficiency-related reasons for this to be so), to
the possibility of a compiler transforming

  while ... {
     synchronized (foo) {
        single_character_output
     }
  }

into 

  synchronized (foo) {
    while ... {
       single_character_output
    }
  }

(This is not equivalent -- anyone else wanting to synchronize
on foo cannot do so till the loop exits)

The Los Alamos guys apparently spent something like two months
trying to find a data dependence bug in a Fortran loop; I
remember this because it was The Demo That Worked by my
advisor, long ago (as in, he foolishly offered to try a
research tool on a real program, and it actually worked!)

Even something as simple as producers-and-consumers is easy
to get wrong, if the wrong thread does the updates.

Now, on the one hand, this is all simple stuff, much lower level
than the sort of concurrency that we'd like to be thinking about;
on the other hand, this is the sort of thing that skilled people
should surely be able to get right, and they don't always.

>I think her problem is that her model of programming is sequential as a
>result of her year of Logo programming. And the world isn't sequential.
>Think of sports teams. Think of traffic. Think of the internal concurrency
>in walking. In games. Think about bank account transfers. The Incredible
>Machine. Cooking. An orchestra. And so on.

Think, in general, of how we massage these things to make them
more sequential.  Traffic -- we have lanes, we have rules, we
have signals.  Places like Massachusetts, this is less true, but
I find the driving in Massachusetts is less efficient and more
stressful than in other places (and to count the dents and dings
on cars I see, more error-inducing).  If you look at how an 
orchestra is organized, it is designed to present the problem
as a collection of sequential tasks; only the conductor is
responsible for the "big picture", and the most difficult
music is that which lacks familiar timing cues (like a regular
beat -- tiny time steps, again).  In addition, the musicians
all generally have their own copies of the music (replication -- a
common trick for reducing interdependence in concurrent problems)
even when playing the same part.  Or, consider a marching band --
with certain exceptions (Stanford, Rice) it's time-step city,
with each actor running on his own program taking queues from
his neighbor (he turns left, two ticks, I turn left).  I've
seen buggy marching bands, too :-).

>You need to redesign from scratch to deal well with concurrency.

It is certainly worth a try, because there are a fair number of
problems that are inherently concurrent and are not dealt with
well, but I would not bet on success.  It might be an excellent
place for a somewhat domain-specific language.

>"asynchronous - the synchronous messages used in CSP couple the
>communication of information with the communication of events. With
>asynchronous messages the two are separated (communication of event is then
>done with semaphores and similar structures) allowing great flexibility in
>system design. This also means there are great opportunities for mistakes,
>so understanding and debugging asynchronous systems is not a trivial task. "
>
>These problems with asynchronous messages don't match my experience. Maybe
>these problems are completely a consequence of mixing asynchrony and
>sequential programming.

My working definition for "asynchronous" is "at any time" -- in particular,
an "asynchronous interrupt" (or "asynchronous method invocation") could
occur at any time during the execution of any other method.  This is
incredibly problematic from the POV of data structure consistency, even
with locks/critical sections.  The worst parts of Java can occur
"at any time" -- Thread.stop() delivers an exception (in theory) at
any point in a another thread's execution, and Thread.suspend() can
stop a thread from making progress no matter what it is executing,
no matter what locks it holds, etc.

Asynchrony presents a particular problem if you wish to test a
program for the presence of bugs (never mind proving the damn
thing correct in that sort of a world); in practice, you can
never know that you have tested for every possible asynchronous
interrupt/invocation that might occur.  In a (compiled) Java system,
we dealt with this problem (for garbage collection) by defining
compiler-inserted safe points where threads are preempted,
including by another thread's need for GC.  By reducing (drastically)
the number of places where the asynchronous event (GC) could occur,
we made it actually possible to test that the correct GC
bookkeeping was maintained for every single place where a GC
could occur.  Testing was still tedious and slow, but orders of
magnitude less so than testing at every instruction.

There is a further problem in all of this if you are the least
bit worried about performance.  Communication is expensive.
Synchronization is expensive.  Adding layers of abstraction can
become incredibly expensive if each layer adds its own separate
layer of locking -- no longer is it a matter of "just dynamically
inline it to make it fast" -- the synchronization itself (on
a multiprocessor) can cost an order of magnitude more than
the procedure call/return that was inlined.  It's interesting
to look at how unsynchronized versions of various library
classes were added between Java 1.0 and Java 1.1; even with
the fastest possible implementations, synchronization is too
slow to make it the default.

I realize that I am talking about this at a completely different
level, but as a practical matter, we have so far made very
little connection between human methods of "computation" and
machine methods of "computation".  Even if there were a
connection, I do not see that just because our brains work
in a certain way, it should therefore be good for us to
write programs that worked in that same way -- we've got
no clear evidence that brains are easy to program, after
all, nor is it clear that brain-programs are especially
good at solving anything but "human" tasks.

I am much more pragmatic in my goals -- writing parallel
programs is generally a very hard thing, especially when
those parallel programs are solving large problems,
especially when they are supposed to run quickly (for
instance, finishing the prediction for tomorrow's weather
before tomorrow arrives).  There are fundamental performance
problems at the machine level (communication/synchronization is
expensive) and people do crazy things to get around those
fundamental performance problems.  A 1000-processor pocket
computer does me no good, if the cost of communication remains
high, and (and here is where the language research would be
interesting) we do not find a good (clear, natural, intuitive)
way to write programs that "communicate" only when they
actually need to.  If message-sending is always communication
is always expensive, that's a problem.

And, as a more Squeak-like aside, I would love to play with
finite-elements.  I am not entirely sure, but I think that
FE analysis of a kite is still very hard (the fabric moves,
sometimes rapidly).  But it would be lots of fun.

David Chase
chase at naturalbridge.com
drchase at alumni.rice.edu
chase at world.std.com