CORBA for Squeak

Fri Oct 20 18:54:28 UTC 2000

Lex Spoon wrote:

> For a longer discussion of this topic check out:
> 
>         http://www.sun.com/research/technical-reports/1994/abstract-29.html
> 
> (I was so happy to run into this writeup, since it's an argument I
> find myself alone in a lot.)

I read the article. Some of what it describes as problems -- for instance,
the memory space differences -- don't exist in the same way in Squeak.
We only pass around object references, never pointers to raw address space.
And I don't agree with their argument that concurrency in distributed systems
is fundamentally different than in a multi-threaded program in general (that
is, once you've taken care of the partial failure problem).

But it made some good points.

I've worked for the last 5 years or so with a distributed Smalltalk system
that implemented a simple RPC scheme using remote proxies. We used this
in systems that typically had 3-5 CPU's on a LAN. The systems ran semiconductor
fab tools (see http://www.adventact.com/cw/intro.html for info on
the ControlWorks framework that we were using).

In this kind of environment, we had to wrap high-level operations
in exception handlers that would handle a remote CPU going offline. I
can't say that the whole thing was transparent -- we ended up having
some code in the proxies that could report whether the remote CPU was
available (not offline or in maintenance mode), and this obviously wasn't
just a RPC. That is, we added support for "quality of reference" queries.

Some of the trickier issues were, indeed, dealing with graceful failure
modes when a remote machine went down. It took quite a bit of time
and some very senior people to sort out all the possible places where
exception handling had to be put in (and define proper system behavior
in those cases).

The system model separated each module of the "cluster tool" into a separate
machine with its own CPU. There were some interesting problems around
graceful failure; for instance, what do you do when the robot in one module
is transferring a wafer into another and the other machine suddenly goes
down and so cannot tell you about its state? One solution was to add
switches and sensors so that either module could verify for itself what
critical state was even if the other one went down.

However, we found that the system was pretty easy to program for. We did
find that many programmers haven't a clue about distributed programming,
handling physical objects with software, or concurrency. One mistake was
to hire people based on their familiarity with Smalltalk. This got us
a bunch of people who'd done GUIs for banking systems. We might have been
better off getting people who'd done distributed or embedded programming,
regardless of which language they'd used, and then trained them on
objects and Smalltalk.

-- 
Ned Konz
currently: Stanwood, WA
email:     ned at bike-nomad.com
homepage:  http://bike-nomad.com