[squeak-dev] The "correct" approach to multi-core systems.

Sat Feb 23 09:03:31 UTC 2008

On Feb 22, 2008, at 11:51 PM, Michael van der Gulik wrote:

>
>
> On 2/23/08, Joshua Gargus <schwa at fastmail.us> wrote:
> On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:
>
>>
>>
>
>
>> this makes sharing objects and synchronising access while still  
>> getting good performance more difficult. I can't back up my claims  
>> yet; we'll see how Hydra VM works out.
>>
>> In the long term, a VM that can run its green threads (aka Process)  
>> on multiple OS threads (aka pthreads) should be the long-term goal.
>
> This is debatable.  Why are you convinced that fine-grained  
> concurrency will not involve a large performance hit due to CPU  
> cache invalidations?  I haven't heard a compelling argument that  
> this won't be a problem (and increasingly so, as the number of cores  
> grows).  We can't pretend that it takes zero time to make an object  
> available for processing on a different core.  As I've said before,  
> I'm willing to be convinced otherwise.
>
>
>
>
> Equally so, why then would any other concurrent implementation, such  
> as the HydraVM, not also have exactly the same problem.

Because within HydraVM, each VM has it's own ObjectMemory in a single,  
contiguous chunk of memory.

Below, you mention processor-affinity.  This is certainly necessary,  
but is orthogonal to the issue.  Let's simplify the discussion by  
assuming that the number of VMs is <= the number of cores, and that  
each VM is pinned to a different core.

32-bit CPU caches typically work on 4KB pages of memory.  You can fit  
quite a few objects in 4KB.  The problem is that is processor A and  
processor B are operating in the same ObjectMemory, they don't have to  
even touch the same object to cause cache contention... they merely  
have to touch objects on the same memory page.  Can you provide a  
formal characterization of worst-case and average-case performance  
under a variety of application profiles?  I wouldn't know where to  
start.

Happily, HydraVM doesn't have to worry about this, because each thread  
operates on a separate ObjectMemory.

> Or why would any other concurrent application not have this problem?

They can, depending on the memory access patterns of the application.

>
>
> Real operating systems implement some form of processor affinity[1]  
> to keep cache on a single processor. The same could be done for the  
> Squeak scheduler. I'm sure that the scheduling algorithm could be  
> tuned to minimize cache invalidations.

As I described above, the problem is not simply ensuring that each  
thread tends to run on the same processor.  I believe that you're  
overlooking a crucial aspect of real-world processor-affinity schemes:  
when a Real Operating System pins a process to a particular  
processor,  the memory for that process is only touched by that  
processor.

I haven't had a chance to take more than a glance at it, but Ulrich  
Draper from Red Hat has written a paper named "What Every Programmer  
Should Know About Memory".  It's dauntingly comprehensive.  (What  
Every Programmer Should Know About Memory)

It might help to think of a multi-core chip as a set of separate  
computers connected by a network (I don't have the reference off-hand,  
but I've seen an Intel whitepaper that explicitly takes this  
viewpoint).  It's expensive and slow to send messages over the network  
to ensure that my cached version of an object isn't stale.  In  
general, it's better to structure our computation so that we know  
exactly when memory needs to be touched by multiple processors.

Cheers,
Josh

>
>
> [1] http://en.wikipedia.org/wiki/Processor_affinity
>
> Gulik.
>
>
> -- 
> http://people.squeakfoundation.org/person/mikevdg
> http://gulik.pbwiki.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20080223/c1b9bdc0/attachment.htm