Questions on Squeak's threading architecture -- why can't Squeak do SMP yet?

Wed Aug 4 18:21:30 UTC 2004

Ed Boyce <edboyce at bu.edu> wrote:
 to be more Squeak VM related.
> 
>      What is stopping Squeak from doing SMP?  (or Async MP for that matter?)
I don't think it's quite as simple as it might seem
> 
>      The Squeak VM classes seem to be pretty finely threaded, and 
> increasingly modularized.
Actually, very little of the class tree is thread-safe. Consider one
thread adding items to a collection whilst another removes them - you'd
need interlocks/monitors/semaphores/whatever. We don't have them
installed everywhere and likely never will. Unless of course someone
really feels the urge to completely  rewrite a huge amount of code to
support such fine threadiong safely. In practice you can get away with
a huge amount, as always. Just don't sell the code to a nuclear
powerplant operator, please!

> Why can't one make a copy of the OS/platform specific guts of the Squeak
> bytecode interpreter or JIT compiler object for each physical
> processor (maybe  make it live in the processor cache since it isn't
> THAT big) and have  the scheduler serve threads (or whole processes
> if they're independent  enough) to available processors using a
> priority scheme of one's choice  (which, this being Squeak, should be
> able to be changed out or altered  on the fly if available hardware
> or loads change during execution).
First problem is that in general processor caches ar not under our
control; the cpu has hardware that caches lines of some size and as
process switches occur differeent cachelines get memory from many
places and maybe some flush some lines as part of a context switch and
some don't.... basically the idea that "hey our little interpreter can
fit in the cache" is not real. Except, interestingly enough in my
favourite, the ARM. The latest ARM architecture allows for a sort-of
cache that IS under application control and WOULD allow for the VM to
be loaded along with crucial data and kept there. Up to 4Mb of it,
which is certainly enough for a lot of useful stuff. Of course, there
are then issues of which application gets control of this TCM etc. And
it's a bit tricky to actually go out and buy an ARM v6 cpu right now.

Next problem is sharing the object memory efficiently and reliably.
Address spaces? Garbage collection? Referential integrity? Is there a
single object space or many? Do all cpus think of themsleves as
sharing the same actual memory or do they have separate memory?

Even if you could have multiple execution units sharing exactly the
same memory space (hmm, another ARM, the MPCore springs to mind) I
think it would be a goodly bit of work.

To some extent you could easily benefit from 'normal' multithreading in
the VM (the OSX & windows VMs certainly do some) to handle user input,
socket signals, stuff like that. Perhaps dedicating a cpu to tracking
memory usage and modifying GC policy, watching code usage and
asynchronously heavily optimising some chunks of translated code, even
perhaps doing things like cleaning up memory left behind by comapaction
(so object allocation could avoid havign to scan the area).

HP used to sell a distributed Smalltalk (I think they passed it back to
Cincom but I'm not sure) but that was more a multiple Smalltalks
talking to each other via something like CORBA.

Of course, if you have monies available to work on this, do let us know
:-)

tim
--
Tim Rowledge, tim at sumeru.stanford.edu, http://sumeru.stanford.edu/tim
Quality assurance: A way to ensure you never deliver shoddy goods accidentally.