Some questions
Guillermo Adrián Molina
guille at losmolina.com.ar
Mon Apr 30 16:52:46 UTC 2007
> Guillermo Adrián Molina writes:
> > > Sets the steps for processing. However the spill worklist has some
> > > registers on it that shouldn't be spilled, so it tries to select a
> > > register to spill. It discards all registers then fails.
> > >
> > > I'd see if there are any moves that might be spilled afterwards,
> > > if so, then all you'd need to do is allow spillRegister to fail
> > > gracefully.
> > >
> >
> > Ok, I will try to see what is happening. Is there any hard limit
> (besides
> > the number of available registers in x86 arch)?
>
> There should be no limit on the number of registers you can use. The
> worst that should happen is you end up with a lot of spill code.
>
> > > > Another thing, Do you want the code I made for cmovxx?
> > >
> > > I'm interested.
> > >
> > > Does it have unit test coverage? Exupery development relies on
> > > testing so that's required.
> > >
> > Not right now, I will work on that later, When I have it I will send it
> to
> > you.
>
> OK
>
> > > When was cmov introduced? I know it was a long time ago but can't
> > > remember precisely when. What I'm concerned with is making Exupery
> > > incompatable with some chips that might still be being used.
> > >
> >
> > Intel's optimization manual says that cmov was introduced in Pentium,
> and
> > in AMD's optimization manual says that cmov is available from athlon. I
> > actually didn't investigate that thoroughly. The fact is that any
> modern
> > computer should have it. I know that in earlier implementations of cmov
> > (Pentium Pro) using the instruction wasn't really an advantage. But
> now,
> > it is really faster. My tinyBenchamrks showed a speed up of 10% when I
> > implemented cmov for smallinteger additions.
> > But, If you are really concerned about compatibility I think you should
> be
> > better considering not to use it.
>
> I'm surprised that your SmallInteger addition code was helped.
>
> In Exupery the SmallInteger addtion sequence is
> bitTest arg1
> jumpIfSet failureBlock
> bitTest arg2
> jumpIfSet failureBlock
> clearTagBit arg1
> add arg1 arg2
> jumpOverflow failureBlock
>
> The failure case is a full message send.
>
The problem with the above code is that you have 3 branches.
That is why I need jump tables, there are cases where cmov really dosn't help
Before I started using exupery, I called special methods in C that
implemented faster code. Every special method (and primitives) returned 1
in case of an error, and if success, returned the result object.
One of this special methods was +. This is part of the code:
if(areIntegers(rcvr,arg)) {
int result;
asm( "movl $1,%%edx\n\t"
"movl %[rcvr],%[result]\n\t"
"addl %[arg],%[result]\n\t"
"cmovol %%edx,%[result]"
: [result] "=r" (result)
: [rcvr] "r" (rcvr), [arg] "r" (arg)
: "edx" );
return result;
}
with this code, I've got up to 10% faster code in + intensive tests.
> There are code fragments where cmov whould be helpful. Converting
> to a boolean comes to mind. The part of "a > b" where you're loading
> either true or false into the result register.
>
Yes, I implemented that with exupery (code for less "<"):
self addExpression: (MedMov
from: (self literal: false)
to: answer ).
trueReg := machine createTemporaryRegister.
self addExpression: (MedMov
from: (self literal: true)
to: trueReg ).
self addExpression: (MedComparision
operator: #cmp
arg1: arg1
arg2: arg2).
self addExpression: (MedCMov
type: #cmovl
from: trueReg
to: answer).
This gave me an impressive improvement (up to 40-50%), when I implemented
all the smallint comparissons in this way. Because, as you know, we dont
need to detag before compare.
> > > Given adequate test coverage I'll add it.
> >
> > I also implemented enter and leave instructions. Not because they were
> > better (they aren't), but, beacuse I use it to signal the inclusion of
> > additional prologue and epilogue code in a final phase added just after
> > the allocator. I do it that way because I dont know until then, which
> > registrs are used, and the number of additional temps needed. I know
> that
> > exupery allways push and pop all the registers (which aren't eax, edx
> and
> > ecx). And that it make place for a big context as temp space in stack.
> I
> > don't do that. I only push the used regs, and if that is not enough, I
> > enter additional stack space. That brakes compatibility with original
> > exupery, but I wanted to implement it that way. For small methods, that
> is
> > really better.
> > So, given that, I don't offer anything of this for you. I think you'll
> > understand.
>
> Exupery's prolog and epilogue sequences could be improved. I've been
> thinking about overhauling that area for a few years now. I'd like
> to have variables spill into their actual locations. So if a stack
> variable was stored, it would always be fetched from the context.
> Then spilled registers wouldn't need to be loaded and stored on
> context switches.
>
> On thing that I might do in 0.13 is colour the isolated parts of a
> method separately. That should improve register allocation as the
> inteference graph will not be polluted by other isolated sections of
> code. A compiled method is often made up of completely isolated
> sections of code. Colouring the sections separately should also speed
> up register allocation.
>
Every improvement you make will help me.
Cheers, Guille
> Bryce
> _______________________________________________
> Exupery mailing list
> Exupery at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>
More information about the Exupery
mailing list