[Vm-dev] BitBLt performance work
tim at rowledge.org
Fri Mar 15 19:01:29 UTC 2013
I'm working on making bitBlt faster, specifically for the Raspberry Pi but with likely benefits for all machines. It only takes a quick look at the generated code to see there is a lot of room for improvement. Some comparison testing for broadly similar cases shows that our bitBlt is only about 10% of the performance of the X related pixman code. We really ought to be able to do better than that.
Part of the problem is simply the complexity buried within inner loops and that can often be dealt with easily enough. Part of it is Pi specific - the ARM11 buried in the SoC has a worst-case word load time of 150 cycles; so cache preloading and hinting is pretty important for any streaming type work. It also has very small caches by comparison to the Watt-sucking intel cpus in our desktops.
Where things get a bit trickier is working out what the blazes is going on with some of the weirder code; there's a *lot* of possible combinations of all the input variables and it looks like only a few are 'realistic'. For example the cmMaskTable/ShiftTable stuff has potential for a rather large number of combinations but only a few appear to be used. If anyone knows what the specs really are I'd be happy to hear about it. We have 40 combination rules, halftone forms, clipping rectangles, swapped-endian bitmaps, external bitmaps, and colormaps and probably pink unicorns.
As mentioned in my mail about the pixelpeeker plugin, some cases are probably best pulled out of bitblt entirely. I suspect that getting the load-up related code cleaner will help quite a bit since there is a lot of work to do before even a tiny blt. Rapid detection of special cases can obviously help - but having some clear evidence of what *Actual* cases are common and worth attacking would be nice.
Info about any facets of the weirder blt code, data about actual common cases, experiences in trying to improve performance, anything you think might help, all welcomed.
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Machine-independent: Does not run on any existing machine.
More information about the Vm-dev