<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Feb 22, 2008, at 11:51 PM, Michael van der Gulik wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><br><br><div><span class="gmail_quote">On 2/23/08, <b class="gmail_sendername">Joshua Gargus</b> &lt;<a href="mailto:schwa@fastmail.us">schwa@fastmail.us</a>&gt; wrote:</span><blockquote class="gmail_quote" style="margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cccccc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex"> <div><div><span class="q"><div>On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:</div><br><blockquote type="cite"><br><br></blockquote></span></div><div><span class="q"><br><blockquote type="cite">this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.<br> <br>In the long term, a VM that can run its green threads (aka Process) on multiple OS&nbsp;threads&nbsp;(aka&nbsp;pthreads) should be the long-term goal. </blockquote><div><br></div></span>This is debatable. &nbsp;Why are you convinced that fine-grained concurrency will not involve a large performance hit due to CPU cache invalidations? &nbsp;I haven't heard a compelling argument that this won't be a problem (and increasingly so, as the number of cores grows). &nbsp;We can't pretend that it takes zero time to make an object available for processing on a different core. &nbsp;As I've said before, I'm willing to be convinced otherwise.</div> <div><br></div></div><br></blockquote></div><br><br>Equally so, why then would any other concurrent implementation, such as the HydraVM, not also have exactly the same problem. </blockquote><div><br class="webkit-block-placeholder"></div><div>Because within HydraVM, each VM has it's own ObjectMemory in a single, contiguous chunk of memory.</div><div><br class="webkit-block-placeholder"></div><div>Below, you mention processor-affinity. &nbsp;This is certainly necessary, but is orthogonal to the issue. &nbsp;Let's simplify the discussion by assuming that the number of VMs is &lt;= the number of cores, and that each VM is pinned to a different core.</div><div><br></div><div>32-bit CPU caches typically work on 4KB pages of memory. &nbsp;You can fit quite a few objects in 4KB. &nbsp;The problem is that is processor A and processor B are operating in the same ObjectMemory, they don't have to even touch the same object to cause cache contention... they merely have to touch objects on the same memory page. &nbsp;Can you provide a formal characterization of worst-case and average-case performance under a variety of application profiles? &nbsp;I wouldn't know where to start. &nbsp;</div><div><br class="webkit-block-placeholder"></div><div>Happily, HydraVM doesn't have to worry about this, because each thread operates on a separate ObjectMemory.</div><div><br></div><div><br></div><blockquote type="cite">Or why would any other concurrent application not have this problem?</blockquote><div><br class="webkit-block-placeholder"></div>They can, depending on the memory access patterns of the application.</div><div><br><blockquote type="cite"><br> <br>Real&nbsp;operating systems implement some form of processor&nbsp;affinity[1]&nbsp;to&nbsp;keep&nbsp;cache&nbsp;on&nbsp;a&nbsp;single&nbsp;processor. The same could be done for the Squeak scheduler. I'm sure that the scheduling algorithm could be tuned to minimize cache invalidations.</blockquote><div><br class="webkit-block-placeholder"></div><div>As I described above, the problem is not simply ensuring that each thread tends to run on the same processor. &nbsp;I believe that you're overlooking a crucial aspect of real-world processor-affinity schemes: when a Real Operating System pins a process to a particular processor, &nbsp;the memory for that process is only touched by that processor.&nbsp;</div><div><br class="webkit-block-placeholder"></div><div>I haven't had a chance to take more than a glance at it, but Ulrich Draper from Red Hat has written a paper named "What Every Programmer Should Know About Memory". &nbsp;It's dauntingly comprehensive. &nbsp;(<a href="http://people.redhat.com/drepper/cpumemory.pdf">What Every Programmer Should Know About Memory)</a></div><div><br class="webkit-block-placeholder"></div><div>It might help to think of a multi-core chip as a set of separate computers connected by a network (I don't have the reference off-hand, but I've seen an Intel whitepaper that explicitly takes this viewpoint). &nbsp;It's expensive and slow to send messages over the network to ensure that my cached version of an object isn't stale. &nbsp;In general, it's better to structure our computation so that we know exactly when memory needs to be touched by multiple processors.</div><div><br></div><div>Cheers,</div><div>Josh</div><div><br class="webkit-block-placeholder"></div><div><br class="webkit-block-placeholder"></div><br><blockquote type="cite"><br> <br>[1] <a href="http://en.wikipedia.org/wiki/Processor_affinity">http://en.wikipedia.org/wiki/Processor_affinity</a><br><br clear="all">Gulik.&nbsp;<br><br><br>-- <br><a href="http://people.squeakfoundation.org/person/mikevdg">http://people.squeakfoundation.org/person/mikevdg</a><br> <a href="http://gulik.pbwiki.com/">http://gulik.pbwiki.com/</a> <br></blockquote></div><br></body></html>