[squeak-dev] Possible approaches for rendering in Morphic 3

Mon Sep 8 13:57:13 UTC 2008

Hi Joshua,

Joshua Gargus wrote:
>
> That's fine, it makes sense.  I was thinking that you might be doing
> something more complicated than you are (which would be fun to go into
> over a beer, but I'm sure that I would fail to get it across via email).
>
>   
That's the biggest problem in this community: Not being able to get 
together to drink beer easily!
>>>> Besides, if Morphic 3 turns to be usable only with help from OpenCL or
>>>> CUDA or some other special hardware, it is not too bad.
>>>>     
>>>>         
>>> My first thought when I read your original post was "how is he planning
>>> to take advantage of graphics hardware?".  In order to use OpenCL or
>>> CUDA effectively, you need to set up a parallelized workload for the GPU
>>> to munch through.  The implementation you describe ("for each pixel,
>>> iterate over the morphs, starting at the one at the top, and going
>>> through morphs behind it, etc...") requires a traversal of the Morphic
>>> scene-graph for each pixel.  A CUDA program running on the GPU can't ask
>>> Squeak for information about which morph is where... all of the GPU
>>> processors will pile up behind this sequential bottleneck.  Do you have
>>> some idea of how you would approach this?  It seems like you'd need to
>>> generate (and perhaps cache) a CUDA-friendly data structure.
>>>   
>>>       
>> Hehehe. You're right! Today I'm experimenting with an idea. I'd like
>> to traverse the morphs graph (actually a tree) and for each morph
>> iterate over the pixels. This would require an enormous amount of
>> memory, to compute all pixels in parallel. What I'm trying to do, is
>> to iterate blocks of, let's say, 32x32 pixels. For each block,
>> traverse the morph graph. At each morph (actually at each "shape"),
>> compute the effect of it over each pixel in the 32x32 block. To do
>> this, instead of building just one "color stack" I need to build 1024
>> of them (32x32). This is a reasonable amount of memory. The work
>> inside each shape, with 1024 pixels can be parallelized and be made
>> CUDA friendly. The data structures involved at this step are some
>> float arrays and float stacks.
>>
>> Besides making it CUDA friendly, it will divide the cost of fetching
>> the Smalltalk objects and traversing the tree by a factor of 1024.
>> This should have a big effect on performance!
>>     
>
> Neat idea!  The Larrabee paper at SIGGRAPH (which I seem to constantly
> be recommending to people) describes a tile-based rendering pipeline
> that is similar to this.
>
> http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf
>
>   
Good! Morphic 3 could really use such an architecture.

I'm really happy with the result so far. These tiles also allowed me to 
avoid going into them in many cases, especially if all transformations 
are linear. It is running 10 times as fast as it did before doing it!
>
> Are you planning to use the same transform on the whole screen, or do
> you have ideas about how to use different transforms in different parts
> of the screen (or different sub-trees of the morph hierarchy)?  If the
> latter, I can imagine explicitly referring to outer transforms to
> transform some properties (eg: circle radius) while letting others use
> the default transform for that context (eg: the circle's center would
> use the logarithmic transform).  If there were reified slots, like in
> Tweak, then the transform to use could be attached to the slot (with nil
> meaning the default transform w/in that context).
>
> Half-baked, I know :-)
>
> Cheers,
> Josh
>
>   
Each morph defines its own coordinate system, for itself and its subtree 
to use. So, the transformation to apply at a certain morph is the 
composition of all transformations up to the world.

Soon I hope to be able to upload a new version of the code, for you (and 
everybody) to see.

Cheers,
Juan Vuletich