Ramblings on how to optimize Squeak for modern CPU bit
manipulation (Was Re: I was wondering ...
Lawson English
english at primenet.com
Sat Jul 3 20:23:44 UTC 1999
I hadn't thought the overhead-issue through properly, obviously.
There seem to be two possible approaches to get an *efficient* full-blown
multi-byte "vector" processor in Squeak:
1) create a built-in Obj-C compiler that will sidestep the overhead that
you point out by converting all control structures and method calls into
compiled Obj-C, while keeping the "vector" emulation portion in
well-optimized, compiled C or vector-processor equivalent on a given
platform (e.g. AltiVec, MMX, etc). This is obviously the most elegant and
universal solution, but is beyond my capabilities.
2) create a virtual "vector processor" with our own "machine code" that
includes simple loop control/logic/arithmetic instructions, as well as
efficient vector instructions that map to efficient, pre-compiled C. I am
pretty sure that I can implement something like this, but the *design* of
such a beast is all-important. What control elements should be included?
Should we emulate fixed-length vector registers, arbitrary-length arrays,
or both? What vector-processing instructions are most important? Etc?
The idea is that you code your vector algorithm in the virtual machine code
and it is pre-compiled into a stream of simple, RISC-like 32-bit codes that
can easily be parsed into low-level calls with far less overhead than the
full-blown Squeak VM. When your Squeak code evokes a specific method, what
happens is that the stream of 32-bit codes is passed to the low-level
interpreter, bypassing the high-overhead of Squeak. Basically, there is
only ONE primitive: the Vector emulator, which accepts a pointer to the
stream of codes and everything is done by the emulator.
This is doable, and completely portable, I think, but is it worth doing?
Chris Reuter <cgreuter at calum.csclub.uwaterloo.ca> said:
[snipt]
>
>However, as my day job involves writing and maintaining development
>tools for a SIMD DSP, I have some experience with the technology and
>the related philosophy.
>
>The basic idea behind vector processing is to do more in one clock
>cycle. As such, my particular DSP will do 4 or 8 arithmetic
>operations on a swath of data in one cycle and calculate the next
>address at the same time. The ideal construct for using this kind of
>thing is a big linear sequence of instructions--an unrolled tight
>loop, as it were. This give you a 4- or 8-fold increase in
>performance over a scalar processor and allows realtime video
>manipulation at 25MHz.
>
>Note that you only get a significant performance boost because you're
>doing an enormous number of successive basic arithmetic operations.
>That is, almost every clock cycle is used to do arithmetic rather than
>flow control or other housekeeping.
>
>In contrast, executing a Squeak expression like:
>
> ByteArray doBy128Bits: [ :chunk | chunk someDSPOperation ]
>
>involves several hundred (at least) clock cycles in between each piece
>of arithmetic done. The performance gained by doing 4 or 8 or 16
>bytes' worth in one clock cycle rather than {4,8,16} isn't a
>significant gain because the arithmetic itself doesn't use that many
>clock cycles compared to the Squeak VM.
>
>Adding extended bit-manipulation primitives to Squeak may well be a
>good idea (I haven't needed them so I don't have an informed opinion
>here) but you may as well go and code them in efficient C--vector
>processing won't help enough.
>
>I _can_ think of several places where vector instructions might
>improve performance: BitBLT could probably benefit from AltiVec or
>MMX-based replacements for a few of the workhorse primitives.
>ByteArray, FloatArray and IntArray might also benefit from
>vector-based "Do-this-to-all-elements"-type primitives. Maybe
>you can think of a few others.
-------------------------------------------------------------------------
Lawson English. Squeak, snore, etc.
Check out <http://www.squeak.org>
-------------------------------------------------------------------------
More information about the Squeak-dev
mailing list
|