bryce@kampjes.demon.co.uk wrote in message news:17752.60448.36947.789918@gargle.gargle.HOWL...
Hello again, This time about sub-pixel aliasing.
Andrew Tween writes:
Hi Bryce, I think it is a good idea to release the solid 3.8 version.
Having said that, I am looking forward to the 3.9 release because I really
want
to try using Exupery on my sub-pixel font filtering algorithm to see if it
can
speed it up. Currently this is in 3.9, and I don't want to port it all back
to
an earlier image/vm, especially since you are moving forward to 3.9.
Exupery runs fine on 3.9, the tests just needed to be fixed.
The best way to find out how it performs for your example would be to load Exupery into your 3.9 image and try it.
The subpixel rendering needs a modified vm (for BitBlt stuff). And Exupery needs a modified vm. Currently these are built from different versions of vmmaker, svn sources,etc. So, I am keen for them to be synchronised, and I am sure it will all come together eventually.
In the meantime, I guess I could create a standalone benchmark, which would be interesting in its own right.
This is probably a topic for another thread, but could you tell from
looking at
the attached method if it is a good candidate for speed-up. It has nested
loops,
does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift:
,
bitAnd: , *, + , // , and some Float calcs.
I'm not sure how well it would run. The code is definately a promising candidate to compile however Exupery doesn't yet compile Floats, large integers, or primitive 166. I don't think the interpreter does any special optimisations for them either so chances are those operations will run at the same speed. Exupery will be able to optimise the SmallInteger calculations and looping overhead.
Is the primitive compilation something that I, or others, could help with? What is involved in adding a primitive to Exupery?
The method could definately be optimised much more. Adding integerAt:put: and ByteArray>>at: primitives would help. So would basic floating point optimisations. Going further, adding support for machine word (32 bit integer) and byte objects should allow us to compile to near C speeds.
The optimisations for machine words, bytes objects, and floating point are all very similar. The game is to remove all the intermediate objects so the calculations are done directly in registers without any conversion and deconversion overhead.
luminance := (0.299*balR)+(0.587*balG)+(0.114*balB). balR := balR + ((luminance - balR)*correctionFactor). balG := balG + ((luminance - balG)*correctionFactor). balB := balB + ((luminance - balB)*correctionFactor). balR := balR truncated. balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]]. balG := balG truncated. balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]]. balB := balB truncated. balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]]. a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0]. colorVal := balB + (balG bitShift: 8) + (balR bitShift: 16) + (a bitShift:
24).
answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.
Is a nice example to show what dynamically inlined primitives could do. The major overhead with floats is allocating memory (1). In this example, using the current optimisation engine it should be possible to create only 4 floats rather than 19 needed by the intepreter. One more allocation will be needed to form colorVal if it overflows into a LargeInteger. SSA should allow all the floating point intermediate values to be removed by allow program analysis over more than one statement.
balR := balR truncated. balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
Should probably be handled via a primitive that truncates a floating point value down to an unsigned 8 bit value. For this example such a primitive may be overkill however converting floating point values to. But with Exupery 3.0 and SSA it would be really nice to be able to optimise to vectors. With vector optimisation we will have a level playing field with C, they will need at least as much compiler machinery as we will and they will probably write their compilers in C requiring much more work than writing in Smalltalk.
In summary, I think there may be some speed improvement now. Adding the array access primitives will help. Floating point is likely to be the next biggest win. Without SSA I doubt that other optimisations will provide enough gain to be worthwhile. With SSA and a few extra object types it should be possible to fully optimise it.
Thanks for your comments. I had intended to re-write the method in C and add it to the plugin, but the advantages of being able to easily play with it in Smalltalk outweigh the speed-up of porting to C, at least while I am still experimenting.
Cheers, Andy
Bryce
(1) After upgrading the VM I'm going to implement fast compiled primitives for #new and #@. This is driven by the largeExplorers benchmark. #@ is inlined into the main interpret loop in the interpreter but Exupery executes it as a normal primitive. This means that compiling largeExplorers can lead to a 8% speed loss.