Exupery for sub-pixel font filtering.

Mon Nov 13 22:41:11 UTC 2006

<bryce at kampjes.demon.co.uk> wrote in message
news:17752.60448.36947.789918 at gargle.gargle.HOWL...
>
> Hello again,
> This time about sub-pixel aliasing.
>
> Andrew Tween writes:
>  > Hi Bryce,
>  > I think it is a good idea to release the solid 3.8 version.
>  >
>  > Having said that, I am looking forward to the 3.9 release because I really
want
>  > to try using Exupery on my sub-pixel font filtering algorithm to see if it
can
>  > speed it up. Currently this is in 3.9, and I don't want to port it all back
to
>  > an earlier image/vm, especially since you are moving forward to 3.9.
>
> Exupery runs fine on 3.9, the tests just needed to be fixed.
>
> The best way to find out how it performs for your example would be to
> load Exupery into your 3.9 image and try it.

The subpixel rendering needs a modified vm (for BitBlt stuff).
And Exupery needs a modified vm.
Currently these are built from different versions of vmmaker, svn sources,etc.
So, I am keen for them to be synchronised, and I am sure it will all come
together eventually.

In the meantime, I guess I could create a standalone benchmark, which would be
interesting in its own right.

>
>  > This is probably a topic for another thread, but could you tell from
looking at
>  > the attached method if it is a good candidate for speed-up. It has nested
loops,
>  > does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift:
,
>  > bitAnd: , *, + , // , and some Float calcs.
>
> I'm not sure how well it would run. The code is definately a promising
> candidate to compile however Exupery doesn't yet compile Floats, large
> integers, or primitive 166. I don't think the interpreter does any
> special optimisations for them either so chances are those operations
> will run at the same speed. Exupery will be able to optimise the
> SmallInteger calculations and looping overhead.

Is the primitive compilation something that I, or others, could help with? What
is involved in adding a primitive to Exupery?

>
> The method could definately be optimised much more. Adding
> integerAt:put: and ByteArray>>at: primitives would help. So would
> basic floating point optimisations. Going further, adding support for
> machine word (32 bit integer) and byte objects should allow us to
> compile to near C speeds.
>
> The optimisations for machine words, bytes objects, and floating point
> are all very similar. The game is to remove all the intermediate
> objects so the calculations are done directly in registers without any
> conversion and deconversion overhead.
>
>   luminance := (0.299*balR)+(0.587*balG)+(0.114*balB).
>   balR := balR + ((luminance - balR)*correctionFactor).
>   balG := balG + ((luminance - balG)*correctionFactor).
>   balB := balB + ((luminance - balB)*correctionFactor).
>   balR := balR  truncated.
>   balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
>   balG := balG  truncated.
>   balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]].
>   balB := balB  truncated.
>   balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]].
>   a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0].
>   colorVal := balB + (balG bitShift: 8) +  (balR bitShift: 16) + (a bitShift:
24).
>   answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.
>
> Is a nice example to show what dynamically inlined primitives could
> do. The major overhead with floats is allocating memory (1). In this
> example, using the current optimisation engine it should be possible
> to create only 4 floats rather than 19 needed by the intepreter. One
> more allocation will be needed to form colorVal if it overflows into a
> LargeInteger. SSA should allow all the floating point intermediate
> values to be removed by allow program analysis over more than one
> statement.
>
>   balR := balR  truncated.
>   balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR :=
> 255]].
>
> Should probably be handled via a primitive that truncates a floating
> point value down to an unsigned 8 bit value. For this example such a
> primitive may be overkill however converting floating point values
> to. But with Exupery 3.0 and SSA it would be really nice to be able to
> optimise to vectors. With vector optimisation we will have a level
> playing field with C, they will need at least as much compiler
> machinery as we will and they will probably write their compilers in C
> requiring much more work than writing in Smalltalk.
>
> In summary, I think there may be some speed improvement now. Adding
> the array access primitives will help. Floating point is likely to be
> the next biggest win. Without SSA I doubt that other optimisations
> will provide enough gain to be worthwhile. With SSA and a few extra
> object types it should be possible to fully optimise it.

Thanks for your comments. I had intended to re-write the method in C and add it
to the plugin, but the advantages of being able to easily play with it in
Smalltalk outweigh the speed-up of porting to C, at least while I am still
experimenting.

Cheers,
Andy

>
> Bryce
>
> (1) After upgrading the VM I'm going to implement fast compiled
> primitives for #new and #@. This is driven by the largeExplorers
> benchmark. #@ is inlined into the main interpret loop in the
> interpreter but Exupery executes it as a normal primitive. This means
> that compiling largeExplorers can lead to a 8% speed loss.