Multy-core CPUs

Thu Oct 18 17:01:53 UTC 2007

On Oct 18, 2007, at 9:06 AM, Robert Withers wrote:

>
> On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:
>
>> This would only make things more complicated since then the  
>> primitives
>> would have to start parallel native threads working on the same  
>> object
>> memory.
>> The problem with native threads is that the current object memory  
>> is not
>> designed to work with multiple independent mutator threads. There  
>> are GC
>> algorithms which work with parallel threads, but AFAIK they all have
>> quite some overhead relative to the single-thread situation.
>>
>> IMO, a combination of native threads and green threads would be  
>> the best
>> (although it still has the problem of parallel GC):
>> The VM runs a small fixed number of native threads (default:  
>> number of
>> available cores, but could be a little more to efficiently handle
>> blocking calls to external functions) which compete for the runnable
>> Smalltalk processes. That way, a number of processes could be  
>> active at
>> any one time instead of just one. The synchronization overhead in the
>> process-switching primitives should be negligible compared to the
>> overhead needed for GC synchronization.
>
> This is exactly what I have started work on.  I want to use the  
> foundations of SqueakElib as a msg passing mechanism between  
> objects assigned to different native threads.  There would be one  
> native thread per core.  I am currently trying to understand what  
> to do with all of the global variables used in the interp loop, so  
> I can have multiple threads running that code.  I have given very  
> little thought to what would need to be protected in the object  
> memory or in the primitives.  I take this very much as a learning  
> project.  Just think, I'll be able to see how the interpreter  
> works, the object memory, bytecode dispatch, primitives....all of  
> it in fact.  If I can come out with a working system that does msg  
> passing, even at the cost of poorly performing object memory, et  
> al., then it will be a major success for me.
>
> It is going to be slower, anyway, because I have to intercept each  
> msg send as a possible non-local send.

Isn't this a show-stopper for a practical system?  Or is this a  
stepping-stone?  If so, how do you envision resolving this in the  
future?

FWIW, Croquet was at one time envisioned to work in the way that you  
describe.  The architects weren't able to produce a satisfactory  
design/implementation within the necessary time frame, and instead  
developed the current "Islands" mechanism.  This has worked out very  
well in practice, and there is no pressing need to try to implement  
the original idea.

In my understanding, Croquet islands and E vats are quite similar in  
that regard (and the latter informed the design of the former)...  
both use an explicit far-ref proxy to an object in another island/ 
vat.  What is the motivation for the approach you have chosen, other  
than it being a fun learning process (which may certainly be a good  
enough reason on its own)?

Cheers,
Josh

> To this end, the Macro Transforms had to be disabled so I could  
> intercept them.  The system slowed considerably.  I hope to speed  
> them up with runtime info: is the receiver in the same thread  
> that's running?
>
> I do appreciate your comments and know that I may be wasting my  
> time.  :)
>
>>
>> The simple yet efficient ObjectMemory of current Squeak can not be  
>> used
>> with parallel threads (at least not without significant  
>> synchronization
>> overhead). AFAIK, efficient algorithms require every thread to  
>> have its
>> own object allocation area to avoid contention on object allocations.
>> Tenuring (making young objects old) and storing new objects into old
>> objects (remembered table) require synchronization. In other words,
>> grafting a threadsafe object memory onto Squeak would be a major  
>> project.
>>
>> In contrast, for a significant subset of applications (servers) it is
>> orders of magnitudes simpler to run several images in parallel. Those
>> images don't stomp on each other's object memory, so there is  
>> absolutely
>> no synchronization overhead. For stateful sessions, a front end can
>> handle routing requests to the image which currently holds a  
>> session's
>> state, stateless requests can be handled by any image.
>>
>> Cheers,
>> Hans-Martin
>>
>
>