[Vm-dev] About BitBlt on the Raspberry Pie VM

tim Rowledge tim at rowledge.org
Thu Feb 5 19:08:35 UTC 2015


Hi Clément
it sounds like somebody  explained things to you very badly.


On 05-02-2015, at 6:02 AM, Clément Bera <bera.clement at gmail.com> wrote:

> Hello everyone,
> 
> When I run a Squeak image on the Raspberry Pie, the UI is much faster with the default VM present on the Raspberry Pie than with a VM compiled on the Cog or Pharo branch.

Depending on what exact version of Raspbian you have on your Pi, the default VM may well be a Cog/Stack vm. The most recent releases since (I think) mid-December have nuScratch and stackvm as the defaults. The older plain interpreter is also there in case we find problems.

If VMs you build are any slower than the default one, you have a problem in your build setup. I don’t do anything clever that isn’t already in the repository, and don’t know enough about makefile stuff to be *able* to do anything very clever.

> 
> I heard that this is because Tim Rowledge changed BitBlt implementation in the Pie VM / Pie image, reimplementing it image-side and not VM-side, resulting in a faster BitBlt.
Goodness me, that needs explaining.
Firstly, I don’t get anywhere near all the credit. I did the specification and integration into the BitBltPlugin but the really clever stuff was done by Ben Avison over in Cambridge - the real Cambridge in the UK. It’s mostly *very* cleverly written ARM assembler with the interface done by perfectly normal Slang code in the plugin. I did do a JitBlt self-compiling ARM blitter 25 or so years ago but that was for monochrome screens (because that was all we had then) and ARM3 level cpus where there was no complicated futzing with nasty unix memory gibberish.

> 
> I have questions:
> - Is it true ?
> - Is the BitBlt code of Tim Rowledge open source ? If so, where is it and what is exactly its license ?

It’s not merely open source it’s *in the vm code repository* and has been for 18 months. If you build a stack vm on an ARM linux it gets included by default. Or at least, it should, though to be honest the autoconf/make stuff is sufficiently confusing that I’d never guarantee it will produce anything. There are also a couple of bitblt extensions to speed up pixel value testing and pixel-touches-pixel testing for sprite collisions.


> - Would it make sense to port that or do something similar on Intel VM ? Would we see a performance gain / loss ?
> 

I suspect it wouldn’t be worth the effort on a full desktop machine with fast memory busses and vast caches. You could certainly consider improving the algorithms in some parts of bitblt but I’m not sure it would really result in much faster blits. There may be opportunities to use the media related instructions that can do sort-of parallel processing  for 32/16/8 bpp data (that’s effectively what Ben did for ARM v6 and might re-do for v7 with NEON later if I’m very lucky). 

> I am asking it because last time I discussed with Bert, he said that it would be fun to have a smalltalk-implemented BitBlt combined with a JIT compiler doing automatic vectorization in order to have vector graphics implemented as Bit-based graphics.

I must be misunderstanding. When discussing vectorization and bitblt one would normally be referring to using the parallel instructions I mentioned above. 

Doing vector graphic operations is a quite different thing and I’d suggest a much more interesting project for most people. Having a Canvas class that can use vector graphics libraries such as Cairo could be a massive speed up in rendering the UI. Obviously the Smalltalk code would need to be written to be able to make use of it, but I think quite a lot is already in place. You only need to see some of the videos from VPRI showing the Nile graphics work to see how interesting it could be.

Implementing all the clever vector graphics stuff in terms of bitblts would be doable (of course) and some parts already exist.. but I think much better to hook up to the ferocious GPUs we have available these days. The hardest and probably slowest part is that a lot of them seem to want to only output directly to a screen, which rather gets in the way, requires strange configurations and copying bitmaps back to ‘our’ space to do more work. Clearly, we need a custom Squeak GPU. Who will offer me US$100m to fund the development? Anyone?


tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful random insult:- Not all his dogs are barking.




More information about the Vm-dev mailing list