Questions on Squeak's threading architecture -- why can't Squeak do SMP yet?

Wed Aug 4 12:43:53 UTC 2004

/delurk
Hi Squeak devs:

     I have a few questions about Squeak VM architecture, and I 
apologize for the newbie sounding nature of some of them; I have spent a 
fair amount of time browsing the Swiki and the System Browser, and tried 
searching the dev list with google, only to find small snippets relating 
to my questions about multithreading in the VM.  My motivation has to do 
with wanting to share physical simulations and visualizations over 
distributed hardware with Croquet when available, but my questions about 
scalability seem to be more Squeak VM related.

     What is stopping Squeak from doing SMP?  (or Async MP for that matter?)

     The Squeak VM classes seem to be pretty finely threaded, and 
increasingly modularized.  I understand some of the potential risks to 
stability when you tell a computer try to walk and chew gum at the same 
time, but some threads (and processes for that matter) should be 
parallizable when they have no data interdependencies.  Why can't one 
make a copy of the OS/platform specific guts of the Squeak bytecode 
interpreter or JIT compiler object for each physical processor (maybe 
make it live in the processor cache since it isn't THAT big) and have 
the scheduler serve threads (or whole processes if they're independent 
enough) to available processors using a priority scheme of one's choice 
(which, this being Squeak, should be able to be changed out or altered 
on the fly if available hardware or loads change during execution).

     Of course, locks are necessary around structures whose shared 
memory components may be in flux when another object tries to 
interrogate it.  But Squeak's message-passed requests are more polite 
than conventional function calls to object methods, and this can 
facilitate a more polite "be with you in a few microseconds" response if 
something is locked than a simple fail. Of course, not all classes and 
methods and SqueakVM threads are suitable for forking into OS and 
processor threads.  Those that are or aren't can be tagged with one bit 
(maybe call it "isSMPThreadable").

    Has this been tried and failed?  Another question: Has anyone tried 
to port Squeak to IBM Power5 processors (under Linux or anything else)?  
What are the broad engineering challenges in the way if I want to run a 
computationally intensive simulation object (e.g. large N molecular 
dynamics simulation with Squeak object hooks) on, say 32 processors of 
my IBM p690, and let that object communicate the jist of its state (by 
passing messages over IP) to clients which do the final (not so 
computationally intensive) final rendering.  I know that the 
client-server bit is exactly what the Croquet team is working on.  But 
one can't involve high performance computing in the mix if threads can't 
be distributed over multi processors.  And that part seems to rest with 
the Squeak VM architecture. (Please point out any misconceptions in any 
of the above).

    Anyone who has used a pervasively threaded microkernel operating 
system, such as BeOS and QNX, understands the substantial power for 
users and their programs of multithreading on multiprocessors done 
right.  There are clearly lots of parallels to be drawn here between 
Squeak and various OS architectures.

    I have more questions, but that is a start.  Thank you in advance to 
anyone who can shed light on these topics and/or point me to some 
definitive reading on them.

            Best Regards,

                 Ed Boyce

---------------------------------------------------
Ed Boyce
Boston University Center for Computational Science
EOT-PACI Program -- http://www.eot.org
(413) 245-3997
edboyce at bu.edu