VI4 (was: RE: [ANN]Draft rough plan for 3.6!)

Wed Apr 16 01:39:38 UTC 2003

Tim Rowledge <tim at sumeru.stanford.edu> wrote:
    (Bryce Kampjes wrote):
	> There are a few things that would be nice purely for performance in an
	> image change. Having a tag bit of 0 rather than 1 for integers would
	> shave 3 instructions off simple arithmetic taking it down to 5
	> instructions on an x86.

	.. and it would involve having the tag bit added to every OOP,
	meaning that one would have to mask out that bit for any
	indirection through that OOP.

Mask out?  This is a joke, surely?

As an exercise, I've just been writing a dumb little Lisp compiler.
(I ran into a nasty problem:  symbols were represented by 6-address,
 and the assembler wouldn't let me do that.  It turns out that the
 ELF format has dozens of tags for relocatable addresses saying what
 silly little field the result has to fit in, but in terms of what
 kind of expression you can use it is far less capable than the
 linkers of 30 years ago.  It can't handle <constant> - <relocatable>.
 Dealing with this is why I can't give you any performance figures.
)
Because my main machine is a SPARC, I'm using the obvious representation
of small integers, the one that the Taddcc[v] and Tsubcc[v] instructions
handle directly:  [30-bit signed value | 00].  Everything else is
8-byte aligned, giving me 6 more tags.  A pointer to a pair has tag 010 (2)
and points to a {tail, head} pair.

This means that (CAR x) turns into
	<get x into the target register, say r>
	ld [r+2], r
and (CDR x) turns into
	<get x into the target register, say r>
	ld [r-2], r
That's right, single-cycle CAR and CDR, thanks largely to the fact
that alignment is checked by the hardware, so the tag check comes for
free.  What happens to the tag bits?  They are absorbed by the
address calculation which is going to happen anyway.

The only major difficulty comes with machines like the IBM/370 (these
days, I suppose I should say z/Architecture) where the displacement in
base+displacement addressing cannot be negative.  Then you use a slightly
different scheme.  Instead of (pointer to object + tag) you use
(pointer to object - 8 + tag) and you get
    L r, 10(r)          ; car
    L r,  6(r)          ; cdr

Indeed, this kind of adjustment is not limited to assembly code.
My dumb little Lisp compiler has about 2000 lines of C runtime code
(basically, the garbage collector, arithmetic [except for allocation
which is done before the C code is called, and except for small integer
arithmetic, which is mostly done in line], and basic I/O).  It's full
of stuff like
    Un_Flonum(R) = Un_Flonum(X) * Un_Flonum(Y)
which turns into
    ((struct Cell *)(R-5))->as_flonum =
        ((struct Cell *)(X-5))->as_flonum *
        ((struct Cell *)(Y-5))->as_flonum;
and this turns into
    ldd [X-5], %f2
    ldd [Y-5], %f4
    fmuld %f2, %f4, %f2
    std %f2, [R-5]
with, and I must emphasise this, *ZERO* cost for tag stripping,
and the compiler gets code this good without any hinting (OK OK so
there's one hint: "-dalign" tells the compiler to assume that doubles
are properly aligned, but this has nothing to do with tags).