Multy-core CPUs

Thu Oct 18 06:12:42 UTC 2007

Steve Wart schrieb:
> I don't know if mapping Smalltalk processes to native threads is the
> way to go, given the pain I've seen in the Java and C# space.
>
> What might be interesting is to develop low-level primitives (along
> the lines of the famed map/reduce operations) that provide parallel
> processing versions of commonly used collection functions.
>
> No idea how easy this would be to do, but on the surface seems more
> promising than trying to do process/thread jiggery pokery.
This would only make things more complicated since then the primitives
would have to start parallel native threads working on the same object
memory.
The problem with native threads is that the current object memory is not
designed to work with multiple independent mutator threads. There are GC
algorithms which work with parallel threads, but AFAIK they all have
quite some overhead relative to the single-thread situation.

IMO, a combination of native threads and green threads would be the best
(although it still has the problem of parallel GC):
The VM runs a small fixed number of native threads (default: number of
available cores, but could be a little more to efficiently handle
blocking calls to external functions) which compete for the runnable
Smalltalk processes. That way, a number of processes could be active at
any one time instead of just one. The synchronization overhead in the
process-switching primitives should be negligible compared to the
overhead needed for GC synchronization.

The simple yet efficient ObjectMemory of current Squeak can not be used
with parallel threads (at least not without significant synchronization
overhead). AFAIK, efficient algorithms require every thread to have its
own object allocation area to avoid contention on object allocations.
Tenuring (making young objects old) and storing new objects into old
objects (remembered table) require synchronization. In other words,
grafting a threadsafe object memory onto Squeak would be a major project.

In contrast, for a significant subset of applications (servers) it is
orders of magnitudes simpler to run several images in parallel. Those
images don't stomp on each other's object memory, so there is absolutely
no synchronization overhead. For stateful sessions, a front end can
handle routing requests to the image which currently holds a session's
state, stateless requests can be handled by any image.

Cheers,
Hans-Martin