[Vm-dev] Ideas on cheap multi-threading for Squeak / Pharo ? (from Tim's article)

Tue Jan 31 10:47:51 UTC 2017

Thanks for your advises Stefan, I was just reading part of your thesis to
understand what has to be done. I believe there is work on-going to remove
all the globals (at least in Pharo).

To conclude this thread:

To introduce multi-threading in Squeak / Pharo, the easiest way is to start
with multiple pairs of image+VM communicating together. It's clean, simple
and working. The problem then lies in the time spent in communication
between the pairs of image+VM. To lower this communication time, we can
have all the images running on the same VM, though they still have
independent heaps, caches and interpreters, with communication APIs
implemented in the VM. Once this is done, we can try to remove
restrictions, for example by having shared memory buffers between images.

Thanks everyone for sharing ideas and remarks.

Best,

Clement

On Tue, Jan 31, 2017 at 10:39 AM, Stefan Marr <smalltalk at stefan-marr.de>
wrote:

>
> Hi:
>
> > On 31 Jan 2017, at 09:51, Clément Bera <bera.clement at gmail.com> wrote:
> >
> > 1) There's this idea of having a multiple images communicating together,
> each image on a different VM, potentially 1 native thread per image. I
> think there is work on-going in this direction through multiple frameworks.
> With a minimal image and a minimal VM, the cost of the pair image+VM
> remains quite cheap, already today <15Mb for the pair is possible. I
> believe this idea is great but does not solve entirely the problem.
>
> That’s the cheapest solution there is. No VM changes required, just
> plugging together existing things.
>
> > 2) Levente's idea is basically to share objects between images, the
> shared objects being read-only and lazily duplicated to worker images upon
> mutation to have low-memory footprint images on the same VM. I like the
> idea, I was thinking of stopping threads to mutate shared objects and to
> give the programmer the responsibility to define a set of shared objects
> that are not frequently mutated instead of duplication, and go later in the
> direction of shared writable memory.
>
> There are all kind of variations possible on that theme.
>
> Also the question is does it really need to be objects? Alternatives
> include things like tuple spaces (think Linda), low-level shared memory
> buffers (Python and others, and apparently ECMAScript 2017).
>
> If you go with objects, the problem is that you need to support GC. And, I
> suppose Eliot will agree that GC for multithreaded systems isn’t exactly
> zero cost.
>
> > 3) Ben's idea is to create a process in a new thread that cannot mutate
> objects in memory. I have issues with this design because each worker
> thread as you say have to work only with the stack, hence they cannot
> allocate objects, hence they cannot use closures.
> >
> > 4) I need to look into the Roar VM project again and Dave Ungar's work
> on multithreaded Smalltalk. I should contact again Stefan Marr I guess.
>
> I am here, and reading…
> You still need a GC that’s capable to work in a multithreaded system.
> Well, and the rest of the VM should also be designed for that, but, the
> image changes, and the ‘safety’ for concurrency was minimal.
> This is as cheap as it gets for shared multithreading, but of course, all
> the burden of getting things right is on the application developer.
>
> For a Smalltalk-like language, I’d argue, you’d always want at least a
> GC/VM that does the right thing.
> That’s not easy.
>
> On the language level, with classes, globals, and all those things, I
> fear, Smalltalk as a language isn’t any better than Java. So, if you don’t
> plan to make a real cut, things will always be messy and strange. Ruby
> struggles with the same problem. They are talking about ‘Guilds’
> http://olivierlacan.com/posts/concurrency-in-ruby-3-with-guilds/ But, you
> still got shared classes/globals. Python and others with there global
> interpreter lock are in the same boat. And work around with things like
> ‘multiprocessing’, essentially giving a nicer interface to option 1.
>
> So, option 1 seems to be a rather clean solution. Gives you also a good
> natural programming model, and the right expectation: strong isolation.
> From that, one could think about having multiple independent interpreters
> with separate heaps within the same CogVM process, to avoid marshaling
> overhead and stuff. That’s very similar to JavaScript web workers.
> From there, one could consider lifting some of the restrictions, perhaps
> like option 2, or like work we did for JavaScript: http://stefan-marr.de/
> downloads/oopsla16-bonetta-et-al-gems-shared-memory-
> parallel-programming-for-nodejs.pdf
>
> Those ways seem to avoid huge VM changes, and rewriting a lot of code.
> Whether the programing model is nice or not, is up to the personal taste I
> suppose.
> If you want a programming model that’s not introducing any surprises, and
> avoids low-level concurrency issues from the start, you’ll have to bite the
> bullet and get rid of globals and global classes anyway. Everything else is
> just as problematic as Java, C#, etc in that department. But, I am biased,
> because I still like the tradeoffs I get from Newspeak for my work.
>
> Best regards
> Stefan
>
> --
> Stefan Marr
> Johannes Kepler Universität Linz
> http://stefan-marr.de/research/
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170131/99f810f6/attachment.html>