Squeak build problem...

Ian Piumarta ian.piumarta at inria.fr
Fri Aug 16 07:18:58 UTC 2002


On Fri, 16 Aug 2002 goran.hultgren at bluefish.se wrote:

> I really love reading the postings from Ian. I don't have a CLUE what he
> is talking about but it sure is fun to read! :-)

;)  But that's okay: when you guys start talking about databases my head
starts spinning.  (And the size of the datasets you mention oh-so-casually
makes my mind boggle!)

FWIW...

> Ian Piumarta <ian.piumarta at inria.fr> wrote:
> [SNIP]
> > I remember (from quite a while ago) that the big change in gcc3 was going
> > to be live range splitting.  Maybe localXX are being spilled and reloaded
> > (disaster!) by the new optimiser?  If these are spilled in the wrong
> > places then it could explain a factor of 2 in speed.  Worth taking a look
> > at.

gcc2 (and lcc and lots of other compilers) choose "up front" which
variables are going to be in registers and then leave them in those
registers for the entire function.  A "live range" is the portion of the
function between a variable's definition and the last use of its value.
If I define "register int a= foo();" at the start and then say "return
a;" at the end, the live range of the variable "a" is the entire
function.  This is a Really Good Thing for Squeak where we want to keep
the SP, IP (and maybe one or two other "critical" values, like the current
bytecode) in registers at all times.  (Part of "gnuification" is
explicitly assigning hard registers to these variables.)

However... suppose I have lots of code between "register int a= foo();"
and "return a;" that doesn't use "a" at all.  The live range kind of has a
hole in the middle where we might make better use of the available
registers by temporarily storing ("spilling") "a" in the stack and then
reading it back into a register ("reloading") it just before we use it in
the "return a;".  Another scenario is that it appears (to the compiler)
that "a" is not being used very aggressively in a certain part of the
function, and so it decides to store it in memory (on the stack) instead
of in a register for certain parts of the function.  This is live range
splitting.  And of course it applies to any value, whether assigned
temporarily (by the compiler) or permanently (by the programmer) to a
register, in the function.  The compiler can "proove" that a variable
we've assigned explicitly to a register isn't used "heavily" in a certain
portion of the function, then spill and reload it with impunity (assuming
it reloads the value into the same register there's no way we'll notice --
except maybe by benchmarking our program ;).

On the Pentium we have six (at best -- three if there's a function call
nearby) available registers.  Two (or three) of these are normally
permanently assigned to localIP and localSP (and sometimes the current
bytecode too) which improves performance dramatically.  But looking at the
code for interpret() there are plenty of very common bytecodes in which
the compiler is struggling hard to find registers, and if it decides to
spill localIP/localSP/currentBytecode during that bytecode to free up
their registers for intermediate results on a path that is hardly ever
used (e.g., fetchClass or checked stores into old objects) then we end up
with zillions of additional (needless) memory references per second.  
This is a Really Bad Thing for Squeak.

Even on the PowerPC (where we usually have 27 registers to play with, or
19 if there's a nearby function call) certain bytecodes in interpret()  
are really straining the compiler's register allocation.  (There are FIFTY
FOUR local variables in interpret()!)  So even on the PPC it's entirely
possible that the compiler is spilling localXX and referring to it from
the stack (or reloading it shortly thereafter, having executed a path in
which the register wasn't even reused), millions of times per second, for
no good reason.

If somebody with gcc3 could compile interp.c with the "-S" option, then
extract just the code for "interpret()" from the output, and then grep in
it for "esi" and "edi" (Pentium) or "25", "26", "27" and "28" (PowerPC)
we'd see right away if the compiler is doing something really stupid with
our register variables.

Another thing somebody with gcc3 could do is check the manual page for a
"-f" option to turn off live range splitting.  If such an option exists
then it might improve Squeak performance a lot.

Ian





More information about the Squeak-dev mailing list