Re: Exupery for sub-pixel font filtering.

13 Nov 2006


      bryce@kampjes.demon.co.uk wrote in message
news:17752.60448.36947.789918@gargle.gargle.HOWL...
...
Hello again,
This time about sub-pixel aliasing.
Andrew Tween writes:
...
Hi Bryce,
I think it is a good idea to release the solid 3.8 version.
Having said that, I am looking forward to the 3.9 release because I really
want
...
...
to try using Exupery on my sub-pixel font filtering algorithm to see if it
can
...
...
speed it up. Currently this is in 3.9, and I don't want to port it all back
to
...
...
an earlier image/vm, especially since you are moving forward to 3.9.
Exupery runs fine on 3.9, the tests just needed to be fixed.
The best way to find out how it performs for your example would be to
load Exupery into your 3.9 image and try it.
The subpixel rendering needs a modified vm (for BitBlt stuff).
And Exupery needs a modified vm.
Currently these are built from different versions of vmmaker, svn sources,etc.
So, I am keen for them to be synchronised, and I am sure it will all come
together eventually.
In the meantime, I guess I could create a standalone benchmark, which would be
interesting in its own right.
...
...
This is probably a topic for another thread, but could you tell from
looking at
...
...
the attached method if it is a good candidate for speed-up. It has nested
loops,
...
...
does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift:
,
...
...
bitAnd: , *, + , // , and some Float calcs.
I'm not sure how well it would run. The code is definately a promising
candidate to compile however Exupery doesn't yet compile Floats, large
integers, or primitive 166. I don't think the interpreter does any
special optimisations for them either so chances are those operations
will run at the same speed. Exupery will be able to optimise the
SmallInteger calculations and looping overhead.
Is the primitive compilation something that I, or others, could help with? What
is involved in adding a primitive to Exupery?
...
The method could definately be optimised much more. Adding
integerAt:put: and ByteArray>>at: primitives would help. So would
basic floating point optimisations. Going further, adding support for
machine word (32 bit integer) and byte objects should allow us to
compile to near C speeds.
The optimisations for machine words, bytes objects, and floating point
are all very similar. The game is to remove all the intermediate
objects so the calculations are done directly in registers without any
conversion and deconversion overhead.
luminance := (0.299*balR)+(0.587*balG)+(0.114*balB).
  balR := balR + ((luminance - balR)*correctionFactor).
  balG := balG + ((luminance - balG)*correctionFactor).
  balB := balB + ((luminance - balB)*correctionFactor).
  balR := balR  truncated.
  balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
  balG := balG  truncated.
  balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]].
  balB := balB  truncated.
  balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]].
  a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0].
  colorVal := balB + (balG bitShift: 8) +  (balR bitShift: 16) + (a bitShift:
24).
...
answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.
Is a nice example to show what dynamically inlined primitives could
do. The major overhead with floats is allocating memory (1). In this
example, using the current optimisation engine it should be possible
to create only 4 floats rather than 19 needed by the intepreter. One
more allocation will be needed to form colorVal if it overflows into a
LargeInteger. SSA should allow all the floating point intermediate
values to be removed by allow program analysis over more than one
statement.
balR := balR  truncated.
  balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR :=
255]].
Should probably be handled via a primitive that truncates a floating
point value down to an unsigned 8 bit value. For this example such a
primitive may be overkill however converting floating point values
to. But with Exupery 3.0 and SSA it would be really nice to be able to
optimise to vectors. With vector optimisation we will have a level
playing field with C, they will need at least as much compiler
machinery as we will and they will probably write their compilers in C
requiring much more work than writing in Smalltalk.
In summary, I think there may be some speed improvement now. Adding
the array access primitives will help. Floating point is likely to be
the next biggest win. Without SSA I doubt that other optimisations
will provide enough gain to be worthwhile. With SSA and a few extra
object types it should be possible to fully optimise it.
Thanks for your comments. I had intended to re-write the method in C and add it
to the plugin, but the advantages of being able to easily play with it in
Smalltalk outweigh the speed-up of porting to C, at least while I am still
experimenting.
Cheers,
Andy
...
Bryce
(1) After upgrading the VM I'm going to implement fast compiled
primitives for #new and #@. This is driven by the largeExplorers
benchmark. #@ is inlined into the main interpret loop in the
interpreter but Exupery executes it as a normal primitive. This means
that compiling largeExplorers can lead to a 8% speed loss.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Exupery for sub-pixel font filtering.