Ramblings on how to optimize Squeak for modern CPU bit manipulation (Was Re: I was wondering ...

Lawson English english at primenet.com
Sat Jul 3 20:23:44 UTC 1999


I hadn't thought the overhead-issue through properly, obviously.

There seem to be two possible approaches to get an *efficient* full-blown
multi-byte "vector" processor in Squeak:

1) create a built-in Obj-C compiler that will sidestep the overhead that
you point out by converting all control structures and method calls into
compiled Obj-C, while keeping the "vector" emulation portion in
well-optimized, compiled C or vector-processor equivalent on a given
platform (e.g. AltiVec, MMX, etc). This is obviously the most elegant and
universal solution, but is beyond my capabilities.

2) create a virtual "vector processor" with our own "machine code" that
includes simple loop control/logic/arithmetic instructions, as well as
efficient vector instructions that map to efficient, pre-compiled C. I am
pretty sure that I can implement something like this, but the *design* of
such a beast is all-important. What control elements should be included?
Should we emulate fixed-length vector registers, arbitrary-length arrays,
or both? What vector-processing instructions are most important? Etc? 

The idea is that you code your vector algorithm in the virtual machine code
and it is pre-compiled into a stream of simple, RISC-like 32-bit codes that
can easily be parsed into low-level calls with far less overhead than the
full-blown Squeak VM. When your Squeak code evokes a specific method, what
happens is that the stream of 32-bit codes is passed to the low-level
interpreter, bypassing the high-overhead of Squeak. Basically, there is
only ONE primitive: the Vector emulator, which accepts a pointer to the
stream of codes and everything is done by the emulator.

This is doable, and completely portable, I think, but is it worth doing?



Chris Reuter <cgreuter at calum.csclub.uwaterloo.ca> said:

[snipt]
>
>However, as my day job involves writing and maintaining development
>tools for a SIMD DSP, I have some experience with the technology and
>the related philosophy.
>
>The basic idea behind vector processing is to do more in one clock
>cycle.  As such, my particular DSP will do 4 or 8  arithmetic
>operations on a swath of data in one cycle and calculate the next
>address at the same time.  The ideal construct for using this kind of
>thing is a big linear sequence of instructions--an unrolled tight
>loop, as it were.  This give you a 4- or 8-fold increase in
>performance over a scalar processor and allows realtime video
>manipulation at 25MHz.
>
>Note that you only get a significant performance boost because you're
>doing an enormous number of successive basic arithmetic operations.
>That is, almost every clock cycle is used to do arithmetic rather than
>flow control or other housekeeping.
>
>In contrast, executing a Squeak expression like:
>
>        ByteArray doBy128Bits: [ :chunk | chunk someDSPOperation ]
>
>involves several hundred (at least) clock cycles in between each piece
>of arithmetic done.  The performance gained by doing 4 or 8 or 16
>bytes' worth in one clock cycle rather than {4,8,16} isn't a
>significant gain because the arithmetic itself doesn't use that many
>clock cycles compared to the Squeak VM.
>
>Adding extended bit-manipulation primitives to Squeak may well be a
>good idea (I haven't needed them so I don't have an informed opinion
>here) but you may as well go and code them in efficient C--vector
>processing won't help enough.
>
>I _can_ think of several places where vector instructions might
>improve performance:  BitBLT could probably benefit from AltiVec or
>MMX-based replacements for a few of the workhorse primitives.
>ByteArray, FloatArray and IntArray might also benefit from
>vector-based "Do-this-to-all-elements"-type primitives.  Maybe
>you can think of a few others.



-------------------------------------------------------------------------
Lawson English. Squeak, snore, etc.
Check out <http://www.squeak.org>
-------------------------------------------------------------------------





More information about the Squeak-dev mailing list