[Vm-dev] BitBLt performance work

David Ungar ungar at me.com
Fri Mar 15 19:09:45 UTC 2013


Cool! Can't wait to read about the results. 

- David
Sent from my iPhone, tap tap

On Mar 15, 2013, at 12:01 PM, tim Rowledge <tim at rowledge.org> wrote:

> 
> I'm working on making bitBlt faster, specifically for the Raspberry Pi but with likely benefits for all machines. It only takes a quick look at the generated code to see there is a lot of room for improvement. Some comparison testing for broadly similar cases shows that our bitBlt is only about 10% of the performance of the X related pixman code. We really ought to be able to do better than that.
> 
> Part of the problem is simply the complexity buried within inner loops and that can often be dealt with easily enough. Part of it is Pi specific - the ARM11 buried in the SoC has a worst-case word load time of 150 cycles; so cache preloading and hinting is pretty important for any streaming type work. It also has very small caches by comparison to the Watt-sucking intel cpus in our desktops.
> 
> Where things get a bit trickier is working out what the blazes is going on with some of the weirder code; there's a *lot* of possible combinations of all the input variables and it looks like only a few are 'realistic'. For example the cmMaskTable/ShiftTable stuff has potential for a rather large number of combinations but only a few appear to be used. If anyone knows what the specs really are I'd be happy to hear about it. We have 40 combination rules, halftone forms, clipping rectangles, swapped-endian bitmaps, external bitmaps, and colormaps and probably pink unicorns.
> 
> As mentioned in my mail about the pixelpeeker plugin, some cases are probably best pulled out of bitblt entirely. I suspect that getting the load-up related code cleaner will help quite a bit since there is a lot of work to do before even a tiny blt. Rapid detection of special cases can obviously help - but having some clear evidence of what *Actual* cases are common and worth attacking would be nice.
> 
> Info about any facets of the weirder blt code, data about actual common cases, experiences in trying to improve performance, anything you think might help, all welcomed.
> 
> 
> 
> tim
> --
> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> Machine-independent:  Does not run on any existing machine.
> 
> 


More information about the Vm-dev mailing list