[squeak-dev] Re: [Vm-dev] RoarVM: The Manycore SqueakVM

Sat Nov 6 19:30:41 UTC 2010

Hello Stephen:

On 06 Nov 2010, at 18:49, Stephen Pair wrote:
> The main question that comes to mind for me is concurrency.  Does this
> VM do anything special to preserve the concurrency semantics of
> smalltalk processes scheduled on a single core?  As I'm sure most
> people are aware, the existing squeak library isn't written with
> thread safety in mind...and as such, even on a single core a naive
> implementation can have issues when executing with concurrent
> processes.
The VM doesn't do anything, thus, you have to take care of ensuring the semantic
you want by yourself. The idea is to preserve the standard Smalltalk programming model
without introducing new facilities. 
Thus, what you get with the RoarVM is what you had before, i.e., processes + mutexes/semaphores,
and in addition to that your processes can be executed in parallel.

I agree, that is a very low-level programming model, but that is the right foundation
for us to experiment with ideas like Ly and Sly.
Ly aims to be a language in which you program by embracing non-determenism and the goal is to
find ways to benefit from the parallelism and the natural non-determinism of the system.

> The solution is generally to try and isolate the objects
> running in different, concurrent processes and use message passing and
> replication as needed. If this VM doesn't do anything in this regard,
> I would expect that it would present an even greater risk of issues
> stemming from concurrency and it would make it all the more important
> to keep objects accessible in different processes cleanly separated.
There are certain actor implementations for Smalltalk, I guess, you could get them working on the RoarVM.
At the moment, there isn't any special VM support for these programming models. However, there are ideas and plans to enable language designers to build such languages in an efficient manner by providing VM support for a flexible notion of encapsulation.

> Another question that comes to mind is how much of a performance hit
> you might see from the architecture trying to maintain cache
> consistency in the face of multiple processes simultaneously updating
> shared memory. Is that something you've found to be an issue?
Sure, that is a big issue. It is already problematic for performance on x86 systems with a few cores, and even worse on Tilera.

Just imagine a simple counter that needs to be updated atomically by 63 other cores...
There is no way to make such a counter scale on any system.
The only thing you can do is to not use such a counter. In 99% of the cases you don't need it anyway.

As Igor pointed out, if you want performance, you will avoid shared mutable state. The solution for such a counter would be to have local counters, and you synchronize only to get a global sum and only when it is really really necessary. But the optimal solution is very application specific.

> Is this something you would have to be careful about when crafting code?
> And, if it is a problem, is it something where you'd need to be
> concerned not just with shared objects, but also with shared pages
> (ie. would you need some measure of control over pages being updated
> from multiple concurrent processes to effectively deal with this
> issue)?
Locality is how I would name the problem, and together with the notion of encapsulation, it is something I am currently looking into.

A brief description of what I am up to can be found here: http://soft.vub.ac.be/~smarr/2010/07/doctoral-symposium-at-splash-2010/
Or even more fluffy here: http://soft.vub.ac.be/~smarr/2010/08/poster-at-splash10/

> Lastly, could you summarize how the design of this VM differs from the
> designs of other multithreaded VMs that have been implemented in the
> past?
Compared to your standard JVM or .NET CLR, the RoarVM provides a similar programming model, but the implementation is designed to experiment on the TILE architecture. And to reach that goal, the VM is kept simple. The heap is divided up into number_of_core parts which are each owned by a single core, and then split into a read-mostly and a read-write heap.
That is an optimization for the TILE64 chip with its restricted caching scheme.
A few more details have been discussed on this thread already.

Best regards
Stefan

-- 
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525