Multy-core CPUs

Peter William Lount peter at smalltalk.org
Sun Oct 21 03:24:41 UTC 2007


Hi,

>> I propose that any distributed object messaging system that is developed
>> for inter-image communication meet a wide range of criteria and
>> application needs before being considered as a part of the upcoming next
>> Smalltalk Standard. These criteria would need to be elucidated from the
>> literature and the needs of members of the Smalltalk community and their
>> clients.
>>
>> 2) It's been mentioned that it would be straightforward to have squeak
>> start up multiple copies of the image (or even multiple different
>> images) in one process (task) memory space with each image having it's
>> own native thread and keeping it's object table and memory separate
>> within the larger memory space. This sounds like a very nice approach.
>>     
> I am not so sure. Squeak VM is a processor hog. Threads within VM will need 
> processor for bytecode interpretation. So a VM process can only scale to a 
> few threads before it starves for processor. 

It's not the byte codes that cause a lot of cpu usage. It's how many 
processor instructions that are being executed that cause that. If you 
run lots of code than you can expect higher cpu usage. The more dense 
the packing of capability into the computer language library of objects 
the more processor instruction may be executed. To find out what Squeak 
is doing when it's chewing through while executing the ~12% cpu you 
mentioned elsewhere you'd have to trace the code. Then you'd see exactly 
what's going on. Tracing the code at two levels would be helpful, first 
at the Smalltalk level and then at the VM primitive byte code level. The 
byte codes may be fine while the image you've deployed might be doing 
many things that you really don't need for your particular application 
needs.

> On the downside, coding errors 
> could trash object memory across threads making testing and debugging 
> difficult. 
Yes. The point that I'm making is that even with so called simple 
concurrency models these errors can happen. Basically there is no such 
think as hassle free simple concurrency when it comes to computers!!! 
Simple concurrency is a myth and a lie. Don't fall for it.


> Will the juice be worth the squeeze?
>   

That depends on what you are using your computer for. If it's an 
application that benefits from massive parallelism then yes it is worth 
the squeeze. If you have a very serial sort of application, like a 
series of complex dependent computations then it might not be worth the 
squeeze at all.

If you have a complex business application that is highly threaded - 
running say ten to twenty Smalltalk processes - on a single native 
thread then it might be worth the squeeze if the users can work 
noticeably faster without incurring concurrency nightmares then yes it's 
worth the squeeze. Otherwise, no it's not worth is as users get very 
frustrated.


>> 3) A single image running on N-cores with M-native threads (M may be
>> larger than N) is the full generalization of course.
>> This may be the best way to take advantage of paradigm shaking chips
>> such as the Tile64 processor from Tilera.
>>     
> With single or few processors, we tend to "serialize" logic ourselves and 
> create huge linear programs. When processors are aplenty, we are free to 
> exploit inherent parallelism and create many small co-ordinating programs. So 
> the N-cores are a problem only for small N (around 8).
>   

Eh? Why only "small N (around 8)? Please illuminate further.


>> However, we may need to rethink the entire architecture of the Smalltalk
>> virtual machine notions since the Tile 64 chip has capabilities that
>> radically alter the paradigm. Messages between processor nodes take less
>> time to pass between nodes then the same amount of data takes to be
>> written into memory. Think about that. It offers a new paradigm
>> unavailable to other N-Core processors (at this current time).
>>     
> True. Squeak's VM could virtualize display/sensors and spawn each project in 
> its own background process bound to a specific processor. The high-speed, low 
> latency paths are well-suited for UI events. Imagine running different 
> projects on each face of a rotating hexecontahedron :-)

That would be cool.

The power of the Tile-64 processor from Tilera is that processors can 
form on the fly arbitrary "compute streams" where data is computed in 
one processor and passed along to another without ever touching RAM. Oh, 
WOW! This means for example the six typical stages of rendering could be 
implemented on six or six * N processors in the Tile-N (where N=36, 64, 
128, 512, 1024 or 4096 or more processors). WOW! Now how would you have 
the Smalltalk system generate objects and messaging binary code from 
Smalltalk source code to model and program that? How? Let's do it! This 
requires a shift in paradigm. This requires a shift in your thinking. 
This requires a shift in my thinking. Think it through. What solutions 
can you come up with?

All the best,

Peter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20071020/ab9ab30e/attachment.htm


More information about the Squeak-dev mailing list