Vats, Islands, and Collections: Finding cheap parallelism
johnson at cs.uiuc.edu
Thu Jan 24 13:18:35 UTC 2008
I hope I am not just saying things you already know.
On Jan 24, 2008 12:34 AM, Matthew Fulmer <tapplek at gmail.com> wrote:
> But I have a pressing concern: How could this be used to do what
> seems (to me and my mentors) to be the cheapest, simplest, most
> obvious parallelism extraction ever: get Collection>>do:,
> collect:, etc. to run one element per native thread (= vat =
I'm not sure exactly what you are planning, because vat/island means
different things to different people. Croquet Islands, for example,
are not perfectly isolated from each other, as they would be if they
were running in different address spaces.
I'm going to assume that what you mean by "vat" is essentially a
Squeak VM with its own garbage collector, running on a single
core/processor/thread. Each object is in a vat. An object in one vat
refers to an object in another vat by means of a proxy. Proxies are
invisible most of the time. When you send a message to an object in
another vat, references to objects in your own vat are converted
automatically to proxies.
Suppose you have a collection of objects, each in its own vat. Just saying
objects do: [:each | each doYourStuff]
won't make them run in parallel. You'll need to say
objects do: [:each | [each doYourStuff] fork]
You probably need a #parDo: method that does this automatically. You
will need a way of synchronizing. A #parCollect: might be better, or
a #parCollectFutures: which is equivalent to
^objects collect: [:each | Future with: [aBlock value: each] fork]
Naturally, there would be more efficient implementations of parDo: or
parCollectFutures: because you probably don't want to fork Squeak
processes, but instead you want to send asynchronous messages to the
other vats and build up a structure of futures.
But that is implementation details. The real point is that parallel
#do: only makes sense if each element of the collection is in a
different vat. It doesn't make sense to parallelize #do: on
collections of numbers. The cost of moving the numbers around will
outweigh the savings from parallelism.
Making #do: parallel might be an easy way for application programmers
to add a little parallelism to their program, but for most kinds of
applications, you won't get much parallelism this way. And if making
a parallel do: means that you have to write a parallel garbage
collector, the price is probably too high. I think that the direction
you are going is better.
More information about the Squeak-dev