[squeak-dev] Course-grained multiprocessing with RemoteTask

David T. Lewis lewis at mail.msen.com
Sun Nov 13 15:21:30 UTC 2011


RemoteTask provides partitioning of processing tasks at the level of
cooperating OS processes. For problems that can be partitioned into
independent tasks running under the supervision of a Squeak image, such
that each task performs a significant amount of processing and returns
a moderately sized result, RemoteTask can provide substantial improvements
in processing run time versus the equivalent serial processing when
running on multi-core hardware.

A task is scheduled in a working image with

  RemoteTask do: taskBlock whenComplete: aOneArgumentBlock

where taskBlock is the task to be scheduled in a forked Squeak image,
and aOneArgumentBlock handles the result object when the remote task
makes data available. Result data is returned through a ReferenceStream
on the stdout pipe from the remote Squeak image. The forked Squeak image
is headless and is quite memory efficient due to Unix copy-on-write
for forked processes.

RemoteTask may also be useful for evaluating a method that otherwise would
block the VM, such as an FFI call to a long-running external function.

RemoteTask is part of the latest CommandShell package on SqueakSource
(CommandShell 4.5.0), and requires OSProcess as well as a Unix or Mac VM
with OSProcessPlugin (it is helpful if the VM also has AioPlugin for process
completion notification, although polling will be used if this is not present).
I have tested only on Linux with standard VM and Cog, although I am hopeful
that this will work on Mac also (confirmation would be appreciated).

Following is an example of three processing tasks assigned to three Squeak
worker images with results returned to the supervisory image on task
completion.

threeParallelTasks
   "Find all primes in a range of large integers. Divide the problem into
   three tasks running the three child images, and return the results to
   the supervisory image. Answer a tasks array and a results array, where
   the results array will be populated on completion of the tasks."

   "RemoteTask threeParallelTasks"

   | p1 p2 p3 results task1 task2 task3 |
   results := Array new: 3.
   task1 := [(100000000000000000000000000000
               to: 100000000000000000000000019999)
            select: [:f | f isPrime] thenCollect: [:s | s asString]].
   task2 := [(100000000000000000000000020000
               to: 100000000000000000000000039999)
            select: [:f | f isPrime] thenCollect: [:s | s asString]].
   task3 := [(100000000000000000000000040000
               to: 100000000000000000000000059999)
            select: [:f | f isPrime] thenCollect: [:s | s asString]].
   "n.b. Assign task to a variable to prevent RemoteTask from being finalized"
   p1 := RemoteTask do: task1 whenComplete: [:result | results at: 1 put: result].
   p2 := RemoteTask do: task2 whenComplete: [:result | results at: 2 put: result].
   p3 := RemoteTask do: task3 whenComplete: [:result | results at: 3 put: result].
   ^ { #tasks -> { p1 . p2 . p3 } . #results -> results }


Dave




More information about the Squeak-dev mailing list