Lots of concurrency

David Chase chase at world.std.com
Wed Oct 24 20:00:20 UTC 2001


[ I run the risk of going on too long.  Oh well. ]

At 04:47 PM 10/23/2001 +1300, you wrote:
>I would very much like to agree with him, but can only do so by
>interpreting "study what Java has" in the sense "learn what to avoid".
>
>In Java:
> - Sun implemented a number of methods which they then had to ask
>   people not to use

I meant this only in terms of concurrency.  There are several things
in Java that I would not recommend to someone starting to write a
new language in, say, 1990 :-).

> - EVERY object can be synchronised, which means that you have to have
>   extremely smart, not to say tricky, implementation techniques or
>   else make every object pay the full price of having a lock.

There is a worse problem, which is that because the locking is exposed,
there is much more room for client-induced deadlocks, or (in an us-vs-them
world) denial of service attacks.  Some of the first hacks for
breaking into Sun's JVM involved clever use of locks on system-level
objects to disrupt the normal order of class and object initialization.
In practice, the locks cannot be used as originally (and expensively)
intended.  

I was originally quite unhappy with the expense of Java locks, which 
are both ubiquitous, and "recursive", meaning that a thread can acquire
the same lock multiple times, but I've changed my mind on lock recursion,
and the expense now is mostly intellectual; at the moment, we (NaturalBridge)
implement recursive locks with a good cost profile in only (about) 17-20
bits per object, with some constant per-thread overhead.  We could reduce
the number of bits, depending on other constraints (I am intentionally not
revealing details of our implementation).  Other people (Sun, IBM) have
different ways of doing locks with similar cost profiles, though I think
ours is the best for multiprocessor use.

>In order to deal reliably with things like Vectors, you find yourself
>needing to put new synchronised methods into Vector, but you aren't
>allowed to do that.

This is what changed my mind, sort of, on recursive locks.  Assume
that we're ignoring problem of the exposed lockability -- then, your
example becomes

synchronized (vec) {
  i = vec.indexOf(x);
  y = vec.elementAt(i);
}

The inner locks are "recursive" -- in Modula-3, that would deadlock
or error.  The Java recursive locks let you get the bigger critical
section that you need here, though their exposure creates the
possibility of other problems.  Of course, the other answer to this
problem is simply to declare that the level of abstraction is wrong;
a Vector is a Vector, and it only provides a certain level of data
consistency, and if you want something else (e.g., maintaining
an invariant among the elements as you change them) then you need
to wrap the Vector in some other type.

In both cases, what both languages did right is they provided
syntactic (and in some cases, checked) sugar to make it easy
to partition things into serial, single-threaded subtasks.
If you are writing a server, for example, it is easiest to
treat each connection as a thread (or two), with

  try { ... } finally { ... }

to express what must happen (files must be closed) no matter
what, and

  try { ... } except { ... } 

to express error handling (with compiler-enforced checking that
you handle the errors, or declare that you do not), and

  synchronized ( somelock) { ... }

to express those portions of code where you do access or update
shared state.  Outside of synchronized blocks, it isn't shared.

The one thing to note here is that this assumes an abundance of
threads -- if you don't have that, then you are forced (by resource
constraints) to restructure your application into "transaction
objects", and perhaps even to write your own "scheduler", if your
transactions are long-lived.  This is a double pain -- not only must
you do the work, but the syntactically enforced restrictions
become less applicable.  Native OS threads are inappropriate
for programming-as-if-threads-were-cheap; the VM designer should
create a thread package on top of native threads to ensure that
threads in fact are cheap.  (Easier said than done on Windows,
though it has been done.)

I don't know if this has much relevance to Squeak, but in my
experience programmers, and I think people in general, are much
happier thinking sequentially.  The sort of bugs that occur with
unstructured parallelism are just too Martian for most people to
contemplate.  On the other hand, if every method is "synchronized"
(to use the Java phrase), it is generally too expensive, and also
not enough (to refer back to someone else's transaction example --
books must balance).

Another approach to parallelism is data-parallelism (I was exposed
to this in the world of super-duper computers, not-OO), and it
seems to be a hard sell, at least to the high-performance crowd.
They appear to often enjoy doing it the hard way, by hand, as they
descend into the 7 circles of MIMD hell.  It works best wehn
the "data" (objects) are mostly-independent, and in particular,
when operations are clearly identified as either "independent" ("local")
or "dependent" ("global", or "communication").  The canonical
model for this is finite-elements computation (whether Conway's
game of life, or heat flow, air flow, reservoir simulation,
bomb simulation, or weather simulation).  The "objects" are
grid points (or grid sectors) and the "local" step is
recalculation of the next state of each object, given
information about its own previous state and its neighbor's
previous states.  The "communication" step occurs 
when each object communicates with its neighbors.  I actually
think that this is something that might be better suited to OO,
since the grid points are clearly objects, and there are clearly
"messages" being sent.  The only hard part is making the Squeak-style
message sending anywhere near fast enough to be interesting; or
alternately, to chose a finite-elements style problem where the
grid points are simultaneously few enough (low communication
costs) and interesting enough (ill-suited to Fortran, more time
spent in local computation) to make it a good match for Squeak.
(The Incredible Machine is not a bad thing to consider here.)

I also think that truly asynchronous communication is a dubious
idea.  The worst parts of Java (the deprecated bits) are more
or less asynchronous (Thread.stop,suspend,resume).  We ended
up (in our Java system) hiding almost all asynchrony; threads
yield at well-defined points, garbage collection occurs at
well-defined points, etc, all enforced by the compiler/VM (in
the safe language, there are no infinite loops, period).

I am afraid that I am perhaps focusing on this at too low a
level for this crowd, but the subject of the thread is "lots
of concurrency", and I am not sure what that is really intended
to mean.  Normally, I think it implies performance.

>Another OO language where concurrency _did_ receive serious thought
>is Dylan.

Any good references on this?  I have some "interim" or "draft" manual
on it, but I never got around to studying how they did concurrency.

David Chase
chase at naturalbridge.com
drchase at alumni.rice.edu
chase at world.std.com





More information about the Squeak-dev mailing list