Re: Some questions

30 Apr 2007


      ...
Guillermo Adrián Molina writes:
...
...
Sets the steps for processing. However the spill worklist has some
registers on it that shouldn't be spilled, so it tries to select a
register to spill. It discards all registers then fails.
I'd see if there are any moves that might be spilled afterwards,
if so, then all you'd need to do is allow spillRegister to fail
gracefully.
Ok, I will try to see what is happening. Is there any hard limit
(besides
...
the number of available registers in x86 arch)?
There should be no limit on the number of registers you can use. The
worst that should happen is you end up with a lot of spill code.
...
...
...
Another thing, Do you want the code I made for cmovxx?
I'm interested.
Does it have unit test coverage? Exupery development relies on
testing so that's required.
Not right now, I will work on that later, When I have it I will send it
to
...
you.
OK
...
...
When was cmov introduced? I know it was a long time ago but can't
remember precisely when. What I'm concerned with is making Exupery
incompatable with some chips that might still be being used.
Intel's optimization manual says that cmov was introduced in Pentium,
and
...
in AMD's optimization manual says that cmov is available from athlon. I
actually didn't investigate that thoroughly. The fact is that any
modern
...
computer should have it. I know that in earlier implementations of cmov
(Pentium Pro) using the instruction wasn't really an advantage. But
now,
...
it is really faster. My tinyBenchamrks showed a speed up of 10% when I
implemented cmov for smallinteger additions.
But, If you are really concerned about compatibility I think you should
be
...
better considering not to use it.
I'm surprised that your SmallInteger addition code was helped.
In Exupery the SmallInteger addtion sequence is
   bitTest arg1
   jumpIfSet failureBlock
   bitTest arg2
   jumpIfSet failureBlock
   clearTagBit arg1
   add arg1 arg2
   jumpOverflow failureBlock
The failure case is a full message send.
The problem with the above code is that you have 3 branches.
That is why I need jump tables, there are cases where cmov really dosn't help
Before I started using exupery, I called special methods in C that
implemented faster code. Every special method (and primitives) returned 1
in case of an error, and if success, returned the result object.
One of this special methods was +. This is part of the code:
if(areIntegers(rcvr,arg)) {
    int result;
    asm(	"movl $1,%%edx\n\t"
    	"movl %[rcvr],%[result]\n\t"
    	"addl %[arg],%[result]\n\t"
    	"cmovol %%edx,%[result]"
    	: [result] "=r" (result)
    	: [rcvr] "r" (rcvr), [arg] "r" (arg)
    	: "edx" );
    return result;
}
with this code, I've got up to 10% faster code in + intensive tests.
...
There are code fragments where cmov whould be helpful. Converting
to a boolean comes to mind. The part of "a > b" where you're loading
either true or false into the result register.
Yes, I implemented that with exupery (code for less "<"):
self addExpression:  (MedMov
    from: (self literal: false)
    to: answer	).
trueReg := machine createTemporaryRegister.
self addExpression:  (MedMov
    from: (self literal: true)
    to: trueReg	).
self addExpression:  (MedComparision
    operator: #cmp
    arg1: arg1
    arg2: arg2).
self addExpression:  (MedCMov
    type: #cmovl
    from: trueReg
    to: answer).
This gave me an impressive improvement (up to 40-50%), when I implemented
all the smallint comparissons in this way. Because, as you know, we dont
need to detag before compare.
...
...
...
Given adequate test coverage I'll add it.
I also implemented enter and leave instructions. Not because they were
better (they aren't), but, beacuse I use it to signal the inclusion of
additional prologue and epilogue code in a final phase added just after
the allocator. I do it that way because I dont know until then, which
registrs are used, and the number of additional temps needed. I know
that
...
exupery allways push and pop all the registers (which aren't eax, edx
and
...
ecx). And that it make place for a big context as temp space in stack.
I
...
don't do that. I only push the used regs, and if that is not enough, I
enter additional stack space. That brakes compatibility with original
exupery, but I wanted to implement it that way. For small methods, that
is
...
really better.
So, given that, I don't offer anything of this for you. I think you'll
understand.
Exupery's prolog and epilogue sequences could be improved. I've been
thinking about overhauling that area for a few years now. I'd like
to have variables spill into their actual locations. So if a stack
variable was stored, it would always be fetched from the context.
Then spilled registers wouldn't need to be loaded and stored on
context switches.
On thing that I might do in 0.13 is colour the isolated parts of a
method separately. That should improve register allocation as the
inteference graph will not be polluted by other isolated sections of
code. A compiled method is often made up of completely isolated
sections of code. Colouring the sections separately should also speed
up register allocation.
Every improvement you make will help me.
Cheers, Guille
...
Bryce
_______________________________________________
Exupery mailing list
Exupery@lists.squeakfoundation.org
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Some questions