<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Feb 22, 2008, at 11:51 PM, Michael van der Gulik wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><br><br><div><span class="gmail_quote">On 2/23/08, <b class="gmail_sendername">Joshua Gargus</b> <<a href="mailto:schwa@fastmail.us">schwa@fastmail.us</a>> wrote:</span><blockquote class="gmail_quote" style="margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cccccc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex"> <div><div><span class="q"><div>On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:</div><br><blockquote type="cite"><br><br></blockquote></span></div><div><span class="q"><br><blockquote type="cite">this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.<br> <br>In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal. </blockquote><div><br></div></span>This is debatable. Why are you convinced that fine-grained concurrency will not involve a large performance hit due to CPU cache invalidations? I haven't heard a compelling argument that this won't be a problem (and increasingly so, as the number of cores grows). We can't pretend that it takes zero time to make an object available for processing on a different core. As I've said before, I'm willing to be convinced otherwise.</div> <div><br></div></div><br></blockquote></div><br><br>Equally so, why then would any other concurrent implementation, such as the HydraVM, not also have exactly the same problem. </blockquote><div><br class="webkit-block-placeholder"></div><div>Because within HydraVM, each VM has it's own ObjectMemory in a single, contiguous chunk of memory.</div><div><br class="webkit-block-placeholder"></div><div>Below, you mention processor-affinity. This is certainly necessary, but is orthogonal to the issue. Let's simplify the discussion by assuming that the number of VMs is <= the number of cores, and that each VM is pinned to a different core.</div><div><br></div><div>32-bit CPU caches typically work on 4KB pages of memory. You can fit quite a few objects in 4KB. The problem is that is processor A and processor B are operating in the same ObjectMemory, they don't have to even touch the same object to cause cache contention... they merely have to touch objects on the same memory page. Can you provide a formal characterization of worst-case and average-case performance under a variety of application profiles? I wouldn't know where to start. </div><div><br class="webkit-block-placeholder"></div><div>Happily, HydraVM doesn't have to worry about this, because each thread operates on a separate ObjectMemory.</div><div><br></div><div><br></div><blockquote type="cite">Or why would any other concurrent application not have this problem?</blockquote><div><br class="webkit-block-placeholder"></div>They can, depending on the memory access patterns of the application.</div><div><br><blockquote type="cite"><br> <br>Real operating systems implement some form of processor affinity[1] to keep cache on a single processor. The same could be done for the Squeak scheduler. I'm sure that the scheduling algorithm could be tuned to minimize cache invalidations.</blockquote><div><br class="webkit-block-placeholder"></div><div>As I described above, the problem is not simply ensuring that each thread tends to run on the same processor. I believe that you're overlooking a crucial aspect of real-world processor-affinity schemes: when a Real Operating System pins a process to a particular processor, the memory for that process is only touched by that processor. </div><div><br class="webkit-block-placeholder"></div><div>I haven't had a chance to take more than a glance at it, but Ulrich Draper from Red Hat has written a paper named "What Every Programmer Should Know About Memory". It's dauntingly comprehensive. (<a href="http://people.redhat.com/drepper/cpumemory.pdf">What Every Programmer Should Know About Memory)</a></div><div><br class="webkit-block-placeholder"></div><div>It might help to think of a multi-core chip as a set of separate computers connected by a network (I don't have the reference off-hand, but I've seen an Intel whitepaper that explicitly takes this viewpoint). It's expensive and slow to send messages over the network to ensure that my cached version of an object isn't stale. In general, it's better to structure our computation so that we know exactly when memory needs to be touched by multiple processors.</div><div><br></div><div>Cheers,</div><div>Josh</div><div><br class="webkit-block-placeholder"></div><div><br class="webkit-block-placeholder"></div><br><blockquote type="cite"><br> <br>[1] <a href="http://en.wikipedia.org/wiki/Processor_affinity">http://en.wikipedia.org/wiki/Processor_affinity</a><br><br clear="all">Gulik. <br><br><br>-- <br><a href="http://people.squeakfoundation.org/person/mikevdg">http://people.squeakfoundation.org/person/mikevdg</a><br> <a href="http://gulik.pbwiki.com/">http://gulik.pbwiki.com/</a> <br></blockquote></div><br></body></html>