[squeak-dev] Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Sun Feb 21 00:54:19 UTC 2010

On Feb 20, 2010, at 2:48 PM, Yoshiki Ohshima wrote:

> At Sat, 20 Feb 2010 14:03:37 -0800,
> Josh Gargus wrote:
>> 
>> While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
>> 
>> We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
>> 
>> The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?
> 
>  Ah, cool.  Incidentally, I am working on an array processing object
> model and language that is supposedly a bit more generalized than
> Kedama, and someday I want to hook that up with GPUs.

Great!  I just downloaded your dissertation today... is it still authoritative, or are there some aspects of the system that are covered better in subsequent documents?

> 
>  But for the starter, the existing Kedama is probably easier to adapt
> and at least take some advantage of the vector/stream processing.
> 
>> I have some other ideas, but now I'm looking for yours.  I know that the interests of the Squeak community are broad, and I'm interested in hearing your ideas for small demos that communicate the power and flexibility of our system.
> 
>  As for flexibility, also one of Kedama's points as well, would be to
> be able to dynamically modify the behavior of particles at runtime.  I
> haven't done my homework yet, but what would be the strategy for doing
> dynamic code change?

In current GPGPU architectures, execution is most efficient when items in the same "work group" follow the same code path.   For example, say that you have particles representing ants that have 10 possible different behaviors specified by an integer from 1-10 (and for simplicity, say that each of these behaviors takes 1000 clock cycles to run).  Further, let's say that you naively write this as a switch-statement in the OpenCL code... a different code-path is dynamically chosen depending on the behavior-index for that ant.  Current architectures are inefficient in the case where ants in the same work-group take different branches through the code.  If all ants have the same behavior, it will take 1000 clock cycles.  If the ants use 2 or the possible 10 behaviors, it will take 2000 clock cycles.  In the worst cast (ants use all 10 behaviors) then it will take 10000 clock cycles.

The GPU can execute multiple work-groups at the same time (approximately 16 today).  So, if you have some way of grouping ants with the same current behavior into the same work-group, then you can improve efficiency greatly compared to assigning them randomly to workgroup.  Of course, this assignment will have overhead.

The above assumes that all behaviors are already known.  You're probably also interested in code-generation.  To do this, you could synthesize a String containing the new source-code that you want to use, and upload the compiled code before running the next iteration of the simulation.  There's currently no way to generate binary code.  There's no fundamental technical reason for this, but OpenCL is immature at this point, and it will be years before the vendors can agree upon a suitable format.

Cheers,
Josh

> 
> -- Yoshiki