I forgot an important paragraph in my last note on this subject - and one that might just help the BeOS folks work out what is slowing them down.
The reason that having the Display be in little-endian form and doing a pixel reverse for any blit where the Display is source and/or target is a poor choice is that the Display is involved in a _lot_ of blits. I forget the precise details, but at one time I had to trace something that lead me to notice that something like 20-50 times more blits were to the screen than there were display update cycles.
On Acorn and Windows (possibly X11, though I haven't read the code enough to be sure) the Display is only copied to the glass when the appropriate OS event is received. At that time, and that time only, is the Dispaly bitmap pixel reversing done - which is why Andreas is quite right that the overhead is fairly low, as long as the OS merges small rectangles etc effectively and as long as we have to handle these events relatively rarely (~100 times/sec max) all is ok. If I understand the code correctly, the Mac updates the glass every time.(?)
If you are doing a pixel reverse _every_ time you write to the Display, not to mention actually forcing a copy to the glass, then performance would probably suffer. Perhaps, just perhaps, this is what the BeOS code is doing?
tim