[Vm-dev] Ideas on cheap multi-threading for Squeak / Pharo ? (from Tim's article)

Ben Coman btc at openinworld.com
Tue Jan 31 01:15:42 UTC 2017


On Tue, Jan 31, 2017 at 4:19 AM, Clément Bera <bera.clement at gmail.com> wrote:
>
> Hi all,
>
> Tim's just shared this lovely article with a 10,000+ core ARM machine. With this kind of machines, it's a bit stupid to use only 1 core when you have 10,000+. I believe we have to find a way to introduce multi-threading in Squeak / Pharo. For co-processors like the Xeon Phi or the graphic cards, I guess it's ok not to use them because their not general purpose processors while the VM is general purpose, but all those 10,000 cores...
>
> For parallel programming, we could consider doing something cheap like the parallel C# loops (Parallel.for and co). The Smalltalk programmer would then explicitly write "collection parallelDo: aBlock" instead of "collection do: aBlock", and if the block is long enough to execute, the cost of parallelisation becomes negligible compared to the performance boost of parallelisation. The block has to perform independent tasks, and if multiple blocks executed in parallel read/write the same memory location, as in C#, the behavior is undefined leading to freezes / crashes. It's the responsibility of the programmer to find out if loop iterations are independent or not (and it's not obvious).
>
> For concurrent programming, there's this design from E where we could have an actor model in Smalltalk where each actor is completely independent from each other, one native thread per actor, and all the common objects (including what's necessary for look-up such as method dictionaries) could be shared as long as they're read-only or immutable. Mutating a shared object such as installing a method in a method dictionary would be detected because such objects are read-only and we can stop all the threads sharing such object to mutate it. The programmer has to keep uncommon the mutation of shared objects to have good performance.
>
> Both design have different goals using multiple cores (parallel and concurrent programming), but in both cases we don't need to rewrite any library to make Squeak / Pharo multi-threaded like they did in Java.
>
> What do you think ?
>
> Is there anybody on the mailing list having ideas on how to introduce threads in Squeak / Pharo in a cheap way that does not require rewriting all core/collection libraries ?
>
> I'm not really into multi-threading myself but I believe the Cog VM will die in 10 years from now if we don't add something to support multi-threading, so I would like to hear suggestions.

My naive idea is that lots might be simplified by having spawned
cputhreads use a different bytecode set that enforces a functional
style of programming by having no write codes.  While restrictive, my
inspiration is that functional languages are supposedly more suited to
parallelsim by having no shared state.  So all algorithms must work on
the stack only, which may be simpler to managing multiple updaters to
objectspace.  This may(?) avoid the need to garbage collect the 1000
cputhreads since everything gets cleared away when the stack dies with
the thread.  On the flip side, might not want to scan these 1000
cputhreads when garbage collecting the main Image thread.  So these
cputhreads might have a marshaling area that reference counts object
accesses external to the thread, and the garbage collector only needs
to scan that area.  Or alternatively, each cputhread maintains its own
objectspace that pulls in copies of objects Spoon style.

Would each cputhread need its own method cache?  Since the application
may have a massive number of individually short lived calculations, to
minimise method lookups perhaps a self-contained
mini-objectspace/method-cache could be seeded/warmed-up by the single
threaded main image, which is copied to each spawned cputhread with
parameters passed to the first invoked function.

Presumably a major use case for these multiple threads would be
numeric calculations.  So perhaps you get enough bang for the buck by
restricting cputhreads to operate only on immediate types?

Another idea is for cputhreads to be written in Slang which is
dynamically compiled and executes as native code, completely avoiding
the complexity of managing multiple access to objectspace.

cheers -ben


More information about the Vm-dev mailing list