Some questions

Mon Apr 30 16:52:46 UTC 2007

> Guillermo Adrián Molina writes:
>  > > Sets the steps for processing. However the spill worklist has some
>  > > registers on it that shouldn't be spilled, so it tries to select a
>  > > register to spill. It discards all registers then fails.
>  > >
>  > > I'd see if there are any moves that might be spilled afterwards,
>  > > if so, then all you'd need to do is allow spillRegister to fail
>  > > gracefully.
>  > >
>  >
>  > Ok, I will try to see what is happening. Is there any hard limit
> (besides
>  > the number of available registers in x86 arch)?
>
> There should be no limit on the number of registers you can use. The
> worst that should happen is you end up with a lot of spill code.
>
>  > >  > Another thing, Do you want the code I made for cmovxx?
>  > >
>  > > I'm interested.
>  > >
>  > > Does it have unit test coverage? Exupery development relies on
>  > > testing so that's required.
>  > >
>  > Not right now, I will work on that later, When I have it I will send it
> to
>  > you.
>
> OK
>
>  > > When was cmov introduced? I know it was a long time ago but can't
>  > > remember precisely when. What I'm concerned with is making Exupery
>  > > incompatable with some chips that might still be being used.
>  > >
>  >
>  > Intel's optimization manual says that cmov was introduced in Pentium,
> and
>  > in AMD's optimization manual says that cmov is available from athlon. I
>  > actually didn't investigate that thoroughly. The fact is that any
> modern
>  > computer should have it. I know that in earlier implementations of cmov
>  > (Pentium Pro) using the instruction wasn't really an advantage. But
> now,
>  > it is really faster. My tinyBenchamrks showed a speed up of 10% when I
>  > implemented cmov for smallinteger additions.
>  > But, If you are really concerned about compatibility I think you should
> be
>  > better considering not to use it.
>
> I'm surprised that your SmallInteger addition code was helped.
>
> In Exupery the SmallInteger addtion sequence is
>    bitTest arg1
>    jumpIfSet failureBlock
>    bitTest arg2
>    jumpIfSet failureBlock
>    clearTagBit arg1
>    add arg1 arg2
>    jumpOverflow failureBlock
>
> The failure case is a full message send.
>
The problem with the above code is that you have 3 branches.
That is why I need jump tables, there are cases where cmov really dosn't help

Before I started using exupery, I called special methods in C that
implemented faster code. Every special method (and primitives) returned 1
in case of an error, and if success, returned the result object.
One of this special methods was +. This is part of the code:

if(areIntegers(rcvr,arg)) {
	int result;
	asm(	"movl $1,%%edx\n\t"
		"movl %[rcvr],%[result]\n\t"
		"addl %[arg],%[result]\n\t"
		"cmovol %%edx,%[result]"
		: [result] "=r" (result)
		: [rcvr] "r" (rcvr), [arg] "r" (arg)
		: "edx" );
	return result;
}

with this code, I've got up to 10% faster code in + intensive tests.

> There are code fragments where cmov whould be helpful. Converting
> to a boolean comes to mind. The part of "a > b" where you're loading
> either true or false into the result register.
>

Yes, I implemented that with exupery (code for less "<"):

self addExpression:  (MedMov
	from: (self literal: false)
	to: answer	).
trueReg := machine createTemporaryRegister.
self addExpression:  (MedMov
	from: (self literal: true)
	to: trueReg	).
self addExpression:  (MedComparision
	operator: #cmp
	arg1: arg1
	arg2: arg2).
self addExpression:  (MedCMov
	type: #cmovl
	from: trueReg
	to: answer).

This gave me an impressive improvement (up to 40-50%), when I implemented
all the smallint comparissons in this way. Because, as you know, we dont
need to detag before compare.

>  > > Given adequate test coverage I'll add it.
>  >
>  > I also implemented enter and leave instructions. Not because they were
>  > better (they aren't), but, beacuse I use it to signal the inclusion of
>  > additional prologue and epilogue code in a final phase added just after
>  > the allocator. I do it that way because I dont know until then, which
>  > registrs are used, and the number of additional temps needed. I know
> that
>  > exupery allways push and pop all the registers (which aren't eax, edx
> and
>  > ecx). And that it make place for a big context as temp space in stack.
> I
>  > don't do that. I only push the used regs, and if that is not enough, I
>  > enter additional stack space. That brakes compatibility with original
>  > exupery, but I wanted to implement it that way. For small methods, that
> is
>  > really better.
>  > So, given that, I don't offer anything of this for you. I think you'll
>  > understand.
>
> Exupery's prolog and epilogue sequences could be improved. I've been
> thinking about overhauling that area for a few years now. I'd like
> to have variables spill into their actual locations. So if a stack
> variable was stored, it would always be fetched from the context.
> Then spilled registers wouldn't need to be loaded and stored on
> context switches.
>
> On thing that I might do in 0.13 is colour the isolated parts of a
> method separately. That should improve register allocation as the
> inteference graph will not be polluted by other isolated sections of
> code. A compiled method is often made up of completely isolated
> sections of code. Colouring the sections separately should also speed
> up register allocation.
>

Every improvement you make will help me.
Cheers, Guille

> Bryce
> _______________________________________________
> Exupery mailing list
> Exupery at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>