Multy-core CPUs

Thu Oct 18 17:28:13 UTC 2007

Hi,

What Ralph and the others have said said is on target in many respects. 
The Erlang and Smalltalk model of messaging have a lot to be desired and 
when combined may provide a powerful and compelling computing platform.

However, having just worked on a very large multi-threaded commercial 
application in live production it is very clear that even single native 
threaded Smalltalk applications have very nasty concurrency problems. 
Our team of seven was able to solve many of the worst of the these 
concurrency problems in a year and a half and improved the reliability 
of this important production application.

It's important that concurrency be taken into account at all levels of 
an application's design, from the lowest levels of the virtual machine 
through the end user experience (which is where concurrency on multiple 
cores can really make a significant paradigm adjusting difference if 
done well).

Of the lessons learned from this complex real world application was that 
almost ALL of the concurrency problems you have with multiple cores 
running N native threads you have with a single core running one native 
thread. The implication of this is that the proposed solutions of 
running multiple images with one native thread each won't really save 
you from concurrency problems, as each image on it's own can have 
serious concurrency issues.

When you have a single native thread running, say, ten to twenty 
Smalltalk green threads (aka Smalltalk processes) the concurrency 
problems can be a real nightmare to contemplate. Comprehension of what 
is happening is exasperated by the limited debugging information 
captured at runtime crash dumps.

Diagnosing the real world concurrency problems in a live production 
application revealed that it's not an easy problem even with one native 
thread running! Additional native threads really wouldn't have changed 
much (assuming that the VM can properly handle GC and other issues as is 
done in Smalltalk MT) with the concurrency problems we were dealing 
with. This includes all the nasty problems with the standard class 
library collection classes.

It is for the above reasons that I support many approaches be 
implemented so that we can find out the best one(s) for various 
application domains.

It's unlikely that there is a one solution fits all needs type of paradigm.

1) With existing Smalltalks (and other languages) it's relatively easy 
to support one image per native "process" (aka task) with their own 
separate memory spaces. This seems to be trivial for squeak. The main 
thing that is needed is an effective and appropriate distributed 
object-messaging system via TCP/IP. This also has the advantage of 
easily distributing the image-nodes across multiple server nodes on a 
network.

I propose that any distributed object messaging system that is developed 
for inter-image communication meet a wide range of criteria and 
application needs before being considered as a part of the upcoming next 
Smalltalk Standard. These criteria would need to be elucidated from the 
literature and the needs of members of the Smalltalk community and their 
clients.

2) It's been mentioned that it would be straightforward to have squeak 
start up multiple copies of the image (or even multiple different 
images) in one process (task) memory space with each image having it's 
own native thread and keeping it's object table and memory separate 
within the larger memory space. This sounds like a very nice approach. 
This is very likely practical for multi-core cpus such as the N-core 
(where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.

3) A single image running on N-cores with M-native threads (M may be 
larger than N) is the full generalization of course.

This may be the best way to take advantage of paradigm shaking chips 
such as the Tile64 processor from Tilera.

However, we may need to rethink the entire architecture of the Smalltalk 
virtual machine notions since the Tile 64 chip has capabilities that 
radically alter the paradigm. Messages between processor nodes take less 
time to pass between nodes then the same amount of data takes to be 
written into memory. Think about that. It offers a new paradigm 
unavailable to other N-Core processors (at this current time).

I believe that we, the Smalltalk community, need to have Smalltalk 
capable of being deployed into the fully generalized scenario running on 
N-cores with M-native threads and with O-images in one memory space 
being able to communicate with P other nodes. It is us that need to do 
the hard work of providing systems that work correctly in the face of 
the multi-core-multi-threaded reality that is now upon us. If we run 
away from the hard work the competitors who tackle it and provide 
workable solutions will prevail.

Food for thought.

All the best,

Peter William Lount
Smalltalk.org Editor
peter at smalltalk.org