[Vm-dev] [OpenSmalltalk/opensmalltalk-vm] Update BitBlt support (primarily for 64-bit ARM) (#565)
notifications at github.com
Tue May 4 11:23:24 UTC 2021
The accelerated BitBlt framework was initially targeted at the ARM11, running the AArch32 instruction set (which is the only one it fully supported).
More recent ARMs run much faster, which has enabled more comprehensive testing via the BitBlt fuzz test framework (https://github.com/bavison/SqueakBitBltTest). This has detected a handful of bugs in both the AArch32-specific and the architecture-neutral parts of the fast BitBlt framework. First I address these.
Next, I add a number of BitBlt fast paths written in platform-independent C. The 8-to-32bpp conversion routine is as fast as anything I could manage with hand-crafted AArch64 assembly. Others are useful as reference implementations for other architectures, or to fill in gaps in their abilities (for example, while I've introduced a class of fast paths for colour maps that only feature two distinct colours, I haven't retrospectively written any AArch32 fast paths for them, so the C fast path will be used for them on AArch32).
The fast path that handles operations with scalar halftoning and 32bpp destination images is a bit of a special case, in that it acts to extend the capabilities of other fast paths. It thus accelerates both AArch32 and AArch64.
The most significant commit, however, is the last one. This features a collection of fast paths implemented using inline AArch64 assembly, tuned for Cortex-A72 (as found in the Raspberry Pi 4). Based on the results of profiling, this has an emphasis on operations with a 32bpp destination image.
Operations with any source depth, in conjunction with 22 of the possible combinationRules (including the common sourceWord, pixPaint and alphaBlend rules) should all be accelerated, providing you don't use little-endian pixel packing, vector halftoning, or non-standard colour map rules when converting from different colour depths.
There are additional fast paths for alphaBlend for either a constant source colour, or a source image whose colour map only consists of two different colours (i.e. where the source image is effectively used as a 1bpp mask, despite being of a greater depth).
You can view, comment on, or merge this pull request online at:
-- Commit Summary --
* Don't pass -m32 to GCC for ARM builds
* Correct various "#if ENABLE_FAST_BLT" to "#ifdef"
* Don't assume sourcePPW is valid on entry to copyBitsFallback
* Fallback routines need extra help to detect intra-image operations
* Remove invalid shortcut in rgbComponentAlphawith
* Fix bug in 32-bit ARM fast paths
* Fix buffer overflow bugs
* Fix corruption bugs with wide 1bpp source images
* Fix type of halftone array for 64-bit targets
* Detect and add a new fast path flag for effective-1bpp colour maps
* C fast path for 32bpp alphaBlend
* C fast path for planar alphaBlend
* C fast path for 8->32bpp conversion
* C fast path for alphaBlend with 1bpp colour map and scalar halftone
* Apply scalar halftoning to colour map entries instead for 32bpp destination
* Enable fast blit code for AArch64
* AArch64 assembly optimisations
-- File Changes --
M build.linux32ARMv6/squeak.cog.spur/build.assert/mvm (2)
M build.linux32ARMv6/squeak.cog.spur/build.debug/mvm (2)
M build.linux32ARMv6/squeak.cog.spur/build/mvm (2)
M build.linux64ARMv8/squeak.cog.spur/build.assert/mvm (1)
M build.linux64ARMv8/squeak.cog.spur/build.debug/mvm (1)
M build.linux64ARMv8/squeak.cog.spur/build/mvm (1)
A platforms/Cross/plugins/BitBltPlugin/BitBltArm64.c (2482)
A platforms/Cross/plugins/BitBltPlugin/BitBltArm64.h (30)
M platforms/Cross/plugins/BitBltPlugin/BitBltArmSimdAsm.hdr (2)
M platforms/Cross/plugins/BitBltPlugin/BitBltArmSimdSourceWord.s (17)
M platforms/Cross/plugins/BitBltPlugin/BitBltDispatch.c (126)
M platforms/Cross/plugins/BitBltPlugin/BitBltDispatch.h (2)
M platforms/Cross/plugins/BitBltPlugin/BitBltGeneric.c (247)
M platforms/Cross/plugins/BitBltPlugin/BitBltInternal.h (32)
M platforms/unix/config/configure (12)
M platforms/unix/plugins/BitBltPlugin/acinclude.m4 (11)
M src/plugins/BitBltPlugin/BitBltPlugin.c (23)
-- Patch Links --
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Vm-dev