Some questions

List overview All Threads
Download

newer

older

Planning Exupery 0.13

Progress towards Exupery 0.12

Guillermo Adrián Molina

19 Apr 2007 19 Apr '07

11:32 a.m.

Hi list, I been playing around with exupery. And now I have a few questions:

1) I cant get tinyBenchmarks working, neither in linux, nor in windows,

Downloaded all the staff from: http://wiki.squeak.org/squeak/Installing+Exupery

used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in linux and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in windows

with prebuild image: http://ftp.squeak.org/Exupery/images/exupery-0.10.tz

Examples run ok, but when I try to run tinyBenchmarks I get segmentation faults

2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my machine, I got: '652,229,299 bytecodes/sec; 89,016,165 sends/sec'

Does anyone know Why I get almost 90 million sends/sec? I think It's quite a big difference from previous versions of vw

3) I saw that primitives for #at: and #at:put: are getting inlined, but I think they are only implemented for Variable Objects (not for bytes nor Characters nor anything else) Is that true?

4) In my experiments with exupery, I get an error if I inline too many methods. I think I am getting out of machine registers, for example, when I try to compile Integer-#digitDiv:reg:. I get this error In the ColouringRegisterAllocator phase, but it is not a "You dont have more registers, dude" kind of error. Is the "no more registers" situation taken into consideration?

5) Is there a way to implement indirect jump tables in exupery?

Thanks a lot. Cheers Guille

Show replies by date

bryce＠kampjes.demon.co.uk

19 Apr 19 Apr

11:16 p.m.

Guillermo Adrián Molina writes:

...

Hi list, I been playing around with exupery. And now I have a few questions:

I cant get tinyBenchmarks working, neither in linux, nor in windows,

Downloaded all the staff from: http://wiki.squeak.org/squeak/Installing+Exupery

used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in linux and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in windows

with prebuild image: http://ftp.squeak.org/Exupery/images/exupery-0.10.tz

Examples run ok, but when I try to run tinyBenchmarks I get segmentation faults

Try using the 0.11 Exupery VM with Exupery 0.11. Exupery VMs must match the Exupery version. The interface between Exupery and the VM is still evolving.

...

Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my

machine, I got: '652,229,299 bytecodes/sec; 89,016,165 sends/sec'

Does anyone know Why I get almost 90 million sends/sec? I think It's quite a big difference from previous versions of vw

I saw that primitives for #at: and #at:put: are getting inlined, but I

think they are only implemented for Variable Objects (not for bytes nor Characters nor anything else) Is that true?

It's true. #at: and #at:put: are only implemented for variable objects. I should write primitives for other types. Good benchmarks that demonstrate the need for such primitives would be nice.

...

In my experiments with exupery, I get an error if I inline too many

methods. I think I am getting out of machine registers, for example, when I try to compile Integer-#digitDiv:reg:. I get this error In the ColouringRegisterAllocator phase, but it is not a "You dont have more registers, dude" kind of error. Is the "no more registers" situation taken into consideration?

I'd guess that it was because a variable was live at an entry point. There's a stack tracing bug which I'm just fixing that could have caused that.

I use the liveness analyser in the register allocator to catch compiler bugs. It's much nicer to catch them there than with crashes.

...

Is there a way to implement indirect jump tables in exupery?

It would be possible. I do use indirect jumps for returns to compiled methods. If you look at any method you should see at least one indirect jump in the return code. Just jump to a register.

Bryce

Guillermo Adrián Molina

26 Apr 26 Apr

2 p.m.

Hi there! Thanks for the answers, found them very useful I have a few more questions

...

Guillermo Adrián Molina writes:

...
Hi list, I been playing around with exupery. And now I have a few

questions:

...

I cant get tinyBenchmarks working, neither in linux, nor in windows,

Downloaded all the staff from: http://wiki.squeak.org/squeak/Installing+Exupery

used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in

linux

...
and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in

windows

...
with prebuild image:

http://ftp.squeak.org/Exupery/images/exupery-0.10.tz

...
Examples run ok, but when I try to run tinyBenchmarks I get

segmentation

...
faults

Try using the 0.11 Exupery VM with Exupery 0.11. Exupery VMs must match the Exupery version. The interface between Exupery and the VM is still evolving.

Ok!, tried that, it worked: 668407310 bytecodes/sec; 13559830 sends/sec 760772659 bytecodes/sec; 13803237 sends/sec 777524677 bytecodes/sec; 12762744 sends/sec 760772659 bytecodes/sec; 13834279 sends/sec 775757575 bytecodes/sec; 13569800 sends/sec I read something about intel being faster than AMD for exupery, Do you know why is that?

...

...

Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my

machine, I got: '652,229,299 bytecodes/sec; 89,016,165 sends/sec'

Does anyone know Why I get almost 90 million sends/sec? I think It's quite a big difference from previous versions of vw

I saw that primitives for #at: and #at:put: are getting inlined, but

I

...
think they are only implemented for Variable Objects (not for bytes nor Characters nor anything else) Is that true?

It's true. #at: and #at:put: are only implemented for variable objects. I should write primitives for other types. Good benchmarks that demonstrate the need for such primitives would be nice.

I 'll try to check that, thanks

...

...

In my experiments with exupery, I get an error if I inline too many

methods. I think I am getting out of machine registers, for example,

when

...
I try to compile Integer-#digitDiv:reg:. I get this error In the ColouringRegisterAllocator phase, but it is not

a

...
"You dont have more registers, dude" kind of error. Is the "no more registers" situation taken into consideration?

I'd guess that it was because a variable was live at an entry point. There's a stack tracing bug which I'm just fixing that could have caused that.

I use the liveness analyser in the register allocator to catch compiler bugs. It's much nicer to catch them there than with crashes.

Yes I've seen those kind of errors (variable live at entry point), corrected them initializing temps with nil. I think this is something different. In this method of the ColouringRegisterAllocator:

findNodeToSpill | spillable | "This is just a basic heuristic, spill the register that interferes with the most other registers. It is possible to do a lot better. The heuristic should concider how much each register is used while it is alive" spillable := spillWorklist select: [:each | ((self hasSpill: each register) not) and: [each register isMachineRegister not]]. spillable := spillable asSortedCollection: [:a :b| a spillWeight > b spillWeight]. ^ spillable first

After compiling lots of methods using exupery, it fails with very big methods because spillable is nil, and spillable first throws an error. If I make less inlining (for example, not inlining divisions and multiplications), it compiles ok! Any ideas?

...

...

Is there a way to implement indirect jump tables in exupery?

It would be possible. I do use indirect jumps for returns to compiled methods. If you look at any method you should see at least one indirect jump in the return code. Just jump to a register.

Yes, I checked that, but I still need to initialize that register with the convenient block, but I need to do that without using Jcc (conditional jumps) to choose from the right one, Any suggestions?

...

Bryce _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

Thanks a lot cheers, Guille

bryce＠kampjes.demon.co.uk

25 Apr 25 Apr

10:42 p.m.

Guillermo Adrián Molina writes:

...

Ok!, tried that, it worked: 668407310 bytecodes/sec; 13559830 sends/sec 760772659 bytecodes/sec; 13803237 sends/sec 777524677 bytecodes/sec; 12762744 sends/sec 760772659 bytecodes/sec; 13834279 sends/sec 775757575 bytecodes/sec; 13569800 sends/sec I read something about intel being faster than AMD for exupery, Do you know why is that?

Exupery was much faster than the interpreter on Pentium 4s. That's because the Pentium 4 is an inefficient chip to run the interprter on.

Those comparisions are rather old now. Hardware has moved on and so has Exupery. Benchmarking now with bigger suites may show different numbers.

...

...
...

In my experiments with exupery, I get an error if I inline too many

methods. I think I am getting out of machine registers, for example,

when

...
I try to compile Integer-#digitDiv:reg:. I get this error In the ColouringRegisterAllocator phase, but it is not

a

...
"You dont have more registers, dude" kind of error. Is the "no more registers" situation taken into consideration?

I'd guess that it was because a variable was live at an entry point. There's a stack tracing bug which I'm just fixing that could have caused that.

I use the liveness analyser in the register allocator to catch compiler bugs. It's much nicer to catch them there than with crashes.

Yes I've seen those kind of errors (variable live at entry point), corrected them initializing temps with nil. I think this is something different. In this method of the ColouringRegisterAllocator:

findNodeToSpill | spillable | "This is just a basic heuristic, spill the register that interferes with the most other registers. It is possible to do a lot better. The heuristic should concider how much each register is used while it is alive" spillable := spillWorklist select: [:each | ((self hasSpill: each register) not) and: [each register isMachineRegister not]]. spillable := spillable asSortedCollection: [:a :b| a spillWeight > b spillWeight]. ^ spillable first

After compiling lots of methods using exupery, it fails with very big methods because spillable is nil, and spillable first throws an error. If I make less inlining (for example, not inlining divisions and multiplications), it compiles ok! Any ideas?

I'd guess it's a limit with the register allocator. It is possible that it can fail to find a register to spill when it needs to spill something. Given this bug will not cause crashes or incorrect execution it's not high priority.

...

...
...

Is there a way to implement indirect jump tables in exupery?

It would be possible. I do use indirect jumps for returns to compiled methods. If you look at any method you should see at least one indirect jump in the return code. Just jump to a register.

Yes, I checked that, but I still need to initialize that register with the convenient block, but I need to do that without using Jcc (conditional jumps) to choose from the right one, Any suggestions?

Exupery also can get the address of a block. That's also done in the send code to save the compiled program counter. The compiled program counter is the address of the machine code block to return to encoded as a SmallInteger. Return blocks are aligned to 2 byte boundaries to allow for tagging. That's enough to build an indirect jump table if you wanted to do that.

Why do you need to build an indirect jump table? What are you trying to do?

Bryce

Guillermo Adrián Molina

26 Apr 26 Apr

5:37 p.m.

...

Guillermo Adrián Molina writes:

...
Ok!, tried that, it worked: 668407310 bytecodes/sec; 13559830 sends/sec 760772659 bytecodes/sec; 13803237 sends/sec 777524677 bytecodes/sec; 12762744 sends/sec 760772659 bytecodes/sec; 13834279 sends/sec 775757575 bytecodes/sec; 13569800 sends/sec I read something about intel being faster than AMD for exupery, Do you know why is that?

Exupery was much faster than the interpreter on Pentium 4s. That's because the Pentium 4 is an inefficient chip to run the interprter on.

Those comparisions are rather old now. Hardware has moved on and so has Exupery. Benchmarking now with bigger suites may show different numbers.

...
...
...

In my experiments with exupery, I get an error if I inline too

many

...
...
...
methods. I think I am getting out of machine registers, for

example,

...
...
when

...
I try to compile Integer-#digitDiv:reg:. I get this error In the ColouringRegisterAllocator phase, but it

is not

...
...
a

...
"You dont have more registers, dude" kind of error. Is the "no more registers" situation taken into consideration?

I'd guess that it was because a variable was live at an entry point. There's a stack tracing bug which I'm just fixing that could have caused that.

I use the liveness analyser in the register allocator to catch compiler bugs. It's much nicer to catch them there than with crashes.

Yes I've seen those kind of errors (variable live at entry point), corrected them initializing temps with nil. I think this is something different. In this method of the ColouringRegisterAllocator:

findNodeToSpill | spillable | "This is just a basic heuristic, spill the register that interferes

with

...
the most other registers. It is possible to do a lot better. The heuristic should concider how much each register is used while it

is

...
alive" spillable := spillWorklist select: [:each | ((self hasSpill: each register) not) and: [each register isMachineRegister not]]. spillable := spillable asSortedCollection: [:a :b| a spillWeight > b spillWeight]. ^ spillable first

After compiling lots of methods using exupery, it fails with very big methods because spillable is nil, and spillable first throws an error.

If

...
I make less inlining (for example, not inlining divisions and multiplications), it compiles ok! Any ideas?

I'd guess it's a limit with the register allocator. It is possible that it can fail to find a register to spill when it needs to spill something. Given this bug will not cause crashes or incorrect execution it's not high priority.

...
...
...

Is there a way to implement indirect jump tables in exupery?

It would be possible. I do use indirect jumps for returns to compiled methods. If you look at any method you should see at least one indirect jump in the return code. Just jump to a register.

Yes, I checked that, but I still need to initialize that register with

the

...
convenient block, but I need to do that without using Jcc (conditional jumps) to choose from the right one, Any suggestions?

Exupery also can get the address of a block. That's also done in the send code to save the compiled program counter. The compiled program counter is the address of the machine code block to return to encoded as a SmallInteger. Return blocks are aligned to 2 byte boundaries to allow for tagging. That's enough to build an indirect jump table if you wanted to do that.

Yes I also notice that, using MedAddress, right? Forgive me, but I still can't get the point: For example:

MedMov from: (MedAddress addressOf: blockN) to: aMedReg MedJump type: #jmp target: aMedReg block1: do something1 jmp end block2: do something2 jmp end block3: do something3 end:

this could be a jump table, But I still need to select which block to jmp. The only way of selecting the block I can Imagine is nesting compares, something with jumps like: MedJump type: #jc target: aLabel instruction: (MedComparision operator: #bitTest arg1: aMed arg2: (MedLiteral literal: 0))). But I want to implement a jump table to avoid conditional branching

...

Why do you need to build an indirect jump table? What are you trying to do?

I am implementing a smalltalk. It compiles directly to machine code, with exupery. The last time I asked something to the list I was starting to use exupery. Now I am almost done with that (without many optimizations). I am doing unit testing right now. My first mail to the list asked what would be the best to implement a new st, so, in my implementation I use: 0 tagged ints. A simple (and a little fat) object memory. A very straightforward send mechanism (with C calling convention for calling methods). No contexts, but using BlockClosures (frames are the same as in C, the C compiler does not differentiate C code from ST code). I compile the ST code from .st files to .s (assembler) using SmaCC, RefactoryBrowser, and then exupery, I still need squeak in order to run all that. I only use the bottom layer of exupery, (does not use IntermediateXXXXXX classes) I implemented the cmovxx instruction in exupery, because it is very useful. But I need jump tables to implement for example, faster versions of ifTrue:ifFalse:, and a lot of other things. This could lead to faster results. Right Now I am getting (with the same machine), tinyBenchmarks: Squeak: 172043010 bytecodes/sec; 5468700 sends/sec Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec. myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec

...

Bryce _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

Cheers Guille

Sebastian Sastre

5:17 p.m.

...

...
Why do you need to build an indirect jump table? What are

you trying

...
to do?

I am implementing a smalltalk. It compiles directly to machine code, with exupery. The last time I asked something to the list I was starting to use exupery. Now I am almost done with that (without many optimizations). I am doing unit testing right now. My first mail to the list asked what would be the best to implement a new st, so, in my implementation I use: 0 tagged ints. A simple (and a little fat) object memory. A very straightforward send mechanism (with C calling convention for calling methods). No contexts, but using BlockClosures (frames are the same as in C, the C compiler does not differentiate C code from ST code).

Hi Guille, I don't get something here. If you are using Exupery to generate asm code why are you talking about a C compiler?

...

I compile the ST code from .st files to .s (assembler) using SmaCC, RefactoryBrowser, and then exupery, I still need squeak in order to run all that. I only use the bottom layer of exupery, (does not use IntermediateXXXXXX classes) I implemented the cmovxx instruction in exupery, because it is very useful. But I need jump tables to implement for example, faster versions of ifTrue:ifFalse:, and a lot of other things. This could lead to faster results. Right Now I am getting (with the same machine), tinyBenchmarks: Squeak: 172043010 bytecodes/sec; 5468700 sends/sec Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec. myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec

That are numbers!

Cheers,

Sebastian

...

...
Bryce _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

Cheers Guille

Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

bryce＠kampjes.demon.co.uk

9:16 p.m.

Guillermo Adrián Molina writes:

...

...
Exupery also can get the address of a block. That's also done in the send code to save the compiled program counter. The compiled program counter is the address of the machine code block to return to encoded as a SmallInteger. Return blocks are aligned to 2 byte boundaries to allow for tagging. That's enough to build an indirect jump table if you wanted to do that.

Yes I also notice that, using MedAddress, right? Forgive me, but I still can't get the point: For example:

MedAddress is a literal that represents the address of a block. In Exupery it gets relocated to be the blocks actual address.

You could write now: (jmp (mem (add (MedAddress blockWithTable) (sar anIndex 2))))

The only thing missing is a way to produce a block that just contained literals. In your case a block that contained MedAddresses.

The MedAddress should be translated into a label refering to the block.

Exupery currently does not have blocks that contain literals but it shouldn't be too hard to add.

...

I am implementing a smalltalk. It compiles directly to machine code, with exupery. The last time I asked something to the list I was starting to use exupery. Now I am almost done with that (without many optimizations). I am doing unit testing right now.

Interesting, what is the goal of your new Smalltalk? What are you trying to do better than the other dialects or is this purely for enjoyment?

Bryce

bryce＠kampjes.demon.co.uk

28 Apr 28 Apr

10:46 a.m.

Guillermo Adrián Molina writes:

...

...
...
After compiling lots of methods using exupery, it fails with very big methods because spillable is nil, and spillable first throws an error.

If

...
I make less inlining (for example, not inlining divisions and multiplications), it compiles ok! Any ideas?

I'd guess it's a limit with the register allocator. It is possible that it can fail to find a register to spill when it needs to spill something. Given this bug will not cause crashes or incorrect execution it's not high priority.

If you want to fix that limit in the register allocator I could give you some pointers. The problem is due to to how the problem is broken down into stages. I'd need to dig through code to remember the details though.

I'm planning on working on the register allocator in the next release. The goal will be making it faster, it has a few serious performance problems.

Bryce

Guillermo Adrián Molina

29 Apr 29 Apr

10:52 a.m.

...

Guillermo Adrián Molina writes:

...
...
...
After compiling lots of methods using exupery, it fails with very

big

...
...
...
methods because spillable is nil, and spillable first throws an

error.

...
...
If

...
I make less inlining (for example, not inlining divisions and multiplications), it compiles ok! Any ideas?

I'd guess it's a limit with the register allocator. It is possible that it can fail to find a register to spill when it needs to spill something. Given this bug will not cause crashes or incorrect execution it's not high priority.

If you want to fix that limit in the register allocator I could give you some pointers. The problem is due to to how the problem is broken down into stages. I'd need to dig through code to remember the details though.

Yes I do want. Please let me know where to start.

...

I'm planning on working on the register allocator in the next release. The goal will be making it faster, it has a few serious performance problems.

Exupery's compile time is not a problem for me. But may be I have to wait for you to finish with the register allocator, in order to try to fix the limit. Please let me know what do you want me to do. Right now, I have allready finished with unit testing. The next thing I will do is to include all the compiler classes in my project (remeber that right now, that is done in Squeak), may be it would be convenient for me to wait for 0.12 before I do that.

Another thing, Do you want the code I made for cmovxx?

Cheers Guille.

...

Bryce _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

bryce＠kampjes.demon.co.uk

9:56 a.m.

Guillermo Adrián Molina writes:

...

...
If you want to fix that limit in the register allocator I could give you some pointers. The problem is due to to how the problem is broken down into stages. I'd need to dig through code to remember the details though.

Yes I do want. Please let me know where to start.

If it's not an urgent problem then it may be better to wait until after 0.13. Or to look at the register allocator during 0.13 development.

Have a look at the stages of simplification. They're done

ColouringRegisterAllocator>>processWorkLists simplifyWorklist isEmpty ifFalse: [^ self simplify]. self coalesce ifTrue: [^ self]. self freeze ifTrue: [^ self]. spillWorklist isEmpty ifFalse: [^ self spillRegister]. self spillMove

Sets the steps for processing. However the spill worklist has some registers on it that shouldn't be spilled, so it tries to select a register to spill. It discards all registers then fails.

I'd see if there are any moves that might be spilled afterwards, if so, then all you'd need to do is allow spillRegister to fail gracefully.

...

...
I'm planning on working on the register allocator in the next release. The goal will be making it faster, it has a few serious performance problems.

Exupery's compile time is not a problem for me. But may be I have to wait for you to finish with the register allocator, in order to try to fix the limit. Please let me know what do you want me to do. Right now, I have allready finished with unit testing. The next thing I will do is to include all the compiler classes in my project (remeber tat right now, that is done in Squeak), may be it would be convenient for me to wait for 0.12 before I do that.

Another thing, Do you want the code I made for cmovxx?

I'm interested.

Does it have unit test coverage? Exupery development relies on testing so that's required.

When was cmov introduced? I know it was a long time ago but can't remember precisely when. What I'm concerned with is making Exupery incompatable with some chips that might still be being used.

Given adequate test coverage I'll add it.

Bryce

Guillermo Adrián Molina

30 Apr 30 Apr

7:19 a.m.

...

Guillermo Adrián Molina writes:

...
...
If you want to fix that limit in the register allocator I could give you some pointers. The problem is due to to how the problem is broken down into stages. I'd need to dig through code to remember the

details

...
...
though.

Yes I do want. Please let me know where to start.

If it's not an urgent problem then it may be better to wait until after 0.13. Or to look at the register allocator during 0.13 development.

Have a look at the stages of simplification. They're done

ColouringRegisterAllocator>>processWorkLists simplifyWorklist isEmpty ifFalse: [^ self simplify]. self coalesce ifTrue: [^ self]. self freeze ifTrue: [^ self]. spillWorklist isEmpty ifFalse: [^ self spillRegister]. self spillMove

Sets the steps for processing. However the spill worklist has some registers on it that shouldn't be spilled, so it tries to select a register to spill. It discards all registers then fails.

I'd see if there are any moves that might be spilled afterwards, if so, then all you'd need to do is allow spillRegister to fail gracefully.

Ok, I will try to see what is happening. Is there any hard limit (besides the number of available registers in x86 arch)?

...

...
...
I'm planning on working on the register allocator in the next

release.

...
...
The goal will be making it faster, it has a few serious performance problems.

Exupery's compile time is not a problem for me. But may be I have to

wait

...
for you to finish with the register allocator, in order to try to fix

the

...
limit. Please let me know what do you want me to do. Right now, I have allready finished with unit testing. The next thing I will do is to include all the compiler classes in my project (remeber

tat

...
right now, that is done in Squeak), may be it would be convenient for

me

...
to wait for 0.12 before I do that.

Another thing, Do you want the code I made for cmovxx?

I'm interested.

Does it have unit test coverage? Exupery development relies on testing so that's required.

Not right now, I will work on that later, When I have it I will send it to you.

...

When was cmov introduced? I know it was a long time ago but can't remember precisely when. What I'm concerned with is making Exupery incompatable with some chips that might still be being used.

Intel's optimization manual says that cmov was introduced in Pentium, and in AMD's optimization manual says that cmov is available from athlon. I actually didn't investigate that thoroughly. The fact is that any modern computer should have it. I know that in earlier implementations of cmov (Pentium Pro) using the instruction wasn't really an advantage. But now, it is really faster. My tinyBenchamrks showed a speed up of 10% when I implemented cmov for smallinteger additions. But, If you are really concerned about compatibility I think you should be better considering not to use it.

...

Given adequate test coverage I'll add it.

I also implemented enter and leave instructions. Not because they were better (they aren't), but, beacuse I use it to signal the inclusion of additional prologue and epilogue code in a final phase added just after the allocator. I do it that way because I dont know until then, which registrs are used, and the number of additional temps needed. I know that exupery allways push and pop all the registers (which aren't eax, edx and ecx). And that it make place for a big context as temp space in stack. I don't do that. I only push the used regs, and if that is not enough, I enter additional stack space. That brakes compatibility with original exupery, but I wanted to implement it that way. For small methods, that is really better. So, given that, I don't offer anything of this for you. I think you'll understand.

Cheers, Guille

...

Bryce _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

bryce＠kampjes.demon.co.uk

29 Apr 29 Apr

4:18 p.m.

Guillermo Adrián Molina writes:

...

...
Sets the steps for processing. However the spill worklist has some registers on it that shouldn't be spilled, so it tries to select a register to spill. It discards all registers then fails.

I'd see if there are any moves that might be spilled afterwards, if so, then all you'd need to do is allow spillRegister to fail gracefully.

Ok, I will try to see what is happening. Is there any hard limit (besides the number of available registers in x86 arch)?

There should be no limit on the number of registers you can use. The worst that should happen is you end up with a lot of spill code.

...

...
...
Another thing, Do you want the code I made for cmovxx?

I'm interested.

Does it have unit test coverage? Exupery development relies on testing so that's required.

Not right now, I will work on that later, When I have it I will send it to you.

...

...
When was cmov introduced? I know it was a long time ago but can't remember precisely when. What I'm concerned with is making Exupery incompatable with some chips that might still be being used.

Intel's optimization manual says that cmov was introduced in Pentium, and in AMD's optimization manual says that cmov is available from athlon. I actually didn't investigate that thoroughly. The fact is that any modern computer should have it. I know that in earlier implementations of cmov (Pentium Pro) using the instruction wasn't really an advantage. But now, it is really faster. My tinyBenchamrks showed a speed up of 10% when I implemented cmov for smallinteger additions. But, If you are really concerned about compatibility I think you should be better considering not to use it.

I'm surprised that your SmallInteger addition code was helped.

In Exupery the SmallInteger addtion sequence is bitTest arg1 jumpIfSet failureBlock bitTest arg2 jumpIfSet failureBlock clearTagBit arg1 add arg1 arg2 jumpOverflow failureBlock

The failure case is a full message send.

There are code fragments where cmov whould be helpful. Converting to a boolean comes to mind. The part of "a > b" where you're loading either true or false into the result register.

...

...
Given adequate test coverage I'll add it.

I also implemented enter and leave instructions. Not because they were better (they aren't), but, beacuse I use it to signal the inclusion of additional prologue and epilogue code in a final phase added just after the allocator. I do it that way because I dont know until then, which registrs are used, and the number of additional temps needed. I know that exupery allways push and pop all the registers (which aren't eax, edx and ecx). And that it make place for a big context as temp space in stack. I don't do that. I only push the used regs, and if that is not enough, I enter additional stack space. That brakes compatibility with original exupery, but I wanted to implement it that way. For small methods, that is really better. So, given that, I don't offer anything of this for you. I think you'll understand.

Exupery's prolog and epilogue sequences could be improved. I've been thinking about overhauling that area for a few years now. I'd like to have variables spill into their actual locations. So if a stack variable was stored, it would always be fetched from the context. Then spilled registers wouldn't need to be loaded and stored on context switches.

On thing that I might do in 0.13 is colour the isolated parts of a method separately. That should improve register allocation as the inteference graph will not be polluted by other isolated sections of code. A compiled method is often made up of completely isolated sections of code. Colouring the sections separately should also speed up register allocation.

Bryce

Guillermo Adrián Molina

30 Apr 30 Apr

6:52 p.m.

...

Guillermo Adrián Molina writes:

...
...
Sets the steps for processing. However the spill worklist has some registers on it that shouldn't be spilled, so it tries to select a register to spill. It discards all registers then fails.

I'd see if there are any moves that might be spilled afterwards, if so, then all you'd need to do is allow spillRegister to fail gracefully.

Ok, I will try to see what is happening. Is there any hard limit

(besides

...
the number of available registers in x86 arch)?

There should be no limit on the number of registers you can use. The worst that should happen is you end up with a lot of spill code.

...
...
...
Another thing, Do you want the code I made for cmovxx?

I'm interested.

Does it have unit test coverage? Exupery development relies on testing so that's required.

Not right now, I will work on that later, When I have it I will send it

to

...
you.

OK

...
...
When was cmov introduced? I know it was a long time ago but can't remember precisely when. What I'm concerned with is making Exupery incompatable with some chips that might still be being used.

Intel's optimization manual says that cmov was introduced in Pentium,

and

...
in AMD's optimization manual says that cmov is available from athlon. I actually didn't investigate that thoroughly. The fact is that any

modern

...
computer should have it. I know that in earlier implementations of cmov (Pentium Pro) using the instruction wasn't really an advantage. But

now,

...
it is really faster. My tinyBenchamrks showed a speed up of 10% when I implemented cmov for smallinteger additions. But, If you are really concerned about compatibility I think you should

be

...
better considering not to use it.

I'm surprised that your SmallInteger addition code was helped.

In Exupery the SmallInteger addtion sequence is bitTest arg1 jumpIfSet failureBlock bitTest arg2 jumpIfSet failureBlock clearTagBit arg1 add arg1 arg2 jumpOverflow failureBlock

The failure case is a full message send.

The problem with the above code is that you have 3 branches. That is why I need jump tables, there are cases where cmov really dosn't help

Before I started using exupery, I called special methods in C that implemented faster code. Every special method (and primitives) returned 1 in case of an error, and if success, returned the result object. One of this special methods was +. This is part of the code:

if(areIntegers(rcvr,arg)) { int result; asm( "movl $1,%%edx\n\t" "movl %[rcvr],%[result]\n\t" "addl %[arg],%[result]\n\t" "cmovol %%edx,%[result]" : [result] "=r" (result) : [rcvr] "r" (rcvr), [arg] "r" (arg) : "edx" ); return result; }

with this code, I've got up to 10% faster code in + intensive tests.

...

There are code fragments where cmov whould be helpful. Converting to a boolean comes to mind. The part of "a > b" where you're loading either true or false into the result register.

Yes, I implemented that with exupery (code for less "<"):

self addExpression: (MedMov from: (self literal: false) to: answer ). trueReg := machine createTemporaryRegister. self addExpression: (MedMov from: (self literal: true) to: trueReg ). self addExpression: (MedComparision operator: #cmp arg1: arg1 arg2: arg2). self addExpression: (MedCMov type: #cmovl from: trueReg to: answer).

This gave me an impressive improvement (up to 40-50%), when I implemented all the smallint comparissons in this way. Because, as you know, we dont need to detag before compare.

...

...
...
Given adequate test coverage I'll add it.

I also implemented enter and leave instructions. Not because they were better (they aren't), but, beacuse I use it to signal the inclusion of additional prologue and epilogue code in a final phase added just after the allocator. I do it that way because I dont know until then, which registrs are used, and the number of additional temps needed. I know

that

...
exupery allways push and pop all the registers (which aren't eax, edx

and

...
ecx). And that it make place for a big context as temp space in stack.

I

...
don't do that. I only push the used regs, and if that is not enough, I enter additional stack space. That brakes compatibility with original exupery, but I wanted to implement it that way. For small methods, that

is

...
really better. So, given that, I don't offer anything of this for you. I think you'll understand.

Exupery's prolog and epilogue sequences could be improved. I've been thinking about overhauling that area for a few years now. I'd like to have variables spill into their actual locations. So if a stack variable was stored, it would always be fetched from the context. Then spilled registers wouldn't need to be loaded and stored on context switches.

On thing that I might do in 0.13 is colour the isolated parts of a method separately. That should improve register allocation as the inteference graph will not be polluted by other isolated sections of code. A compiled method is often made up of completely isolated sections of code. Colouring the sections separately should also speed up register allocation.

Every improvement you make will help me. Cheers, Guille

...

Bryce _______________________________________________ Exupery mailing list Exupery@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

bryce＠kampjes.demon.co.uk

5 May 5 May

4:35 p.m.

Guillermo Adrián Molina writes:

...

...
In Exupery the SmallInteger addtion sequence is bitTest arg1 jumpIfSet failureBlock bitTest arg2 jumpIfSet failureBlock clearTagBit arg1 add arg1 arg2 jumpOverflow failureBlock

The failure case is a full message send.

The problem with the above code is that you have 3 branches. That is why I need jump tables, there are cases where cmov really dosn't help

There is only 3 branches and I'm hoping that they will never be taken so they should be easy to predict. That said the branches do use branch predictor resources which could cause other branches not to be predicted as well.

...

Before I started using exupery, I called special methods in C that implemented faster code. Every special method (and primitives) returned 1 in case of an error, and if success, returned the result object. One of this special methods was +. This is part of the code:

if(areIntegers(rcvr,arg)) { int result; asm( "movl $1,%%edx\n\t" "movl %[rcvr],%[result]\n\t" "addl %[arg],%[result]\n\t" "cmovol %%edx,%[result]" : [result] "=r" (result) : [rcvr] "r" (rcvr), [arg] "r" (arg) : "edx" ); return result; }

with this code, I've got up to 10% faster code in + intensive tests.

Do you have conditionals inside areIntegers and to check if the result is 1 indicating an error?

...

...
There are code fragments where cmov whould be helpful. Converting to a boolean comes to mind. The part of "a > b" where you're loading either true or false into the result register.

Yes, I implemented that with exupery (code for less "<"):

self addExpression: (MedMov from: (self literal: false) to: answer ). trueReg := machine createTemporaryRegister. self addExpression: (MedMov from: (self literal: true) to: trueReg ). self addExpression: (MedComparision operator: #cmp arg1: arg1 arg2: arg2). self addExpression: (MedCMov type: #cmovl from: trueReg to: answer).

This gave me an impressive improvement (up to 40-50%), when I implemented all the smallint comparissons in this way. Because, as you know, we dont need to detag before compare.

Exupery removes many of the boolean conversion sequences.

"a < b ifTrue: [x]"

First gets translated into:

(booleanToControlFlow (controlFlowToBoolean (a < b)))

Then Exupery removes the booleanToControlFlow controlFlowToBoolean sequence. The booleanToControlFlow sequence is moved to the failure case where either a or b are not SmallIntegers.

So I'm not sure if speeding up the general case will help Exupery as I'm not sure how often it's called.

Bryce

Guillermo Adrián Molina

6 May 6 May

10:42 a.m.

...

Guillermo Adrián Molina writes:

...
...
In Exupery the SmallInteger addtion sequence is bitTest arg1 jumpIfSet failureBlock bitTest arg2 jumpIfSet failureBlock clearTagBit arg1 add arg1 arg2 jumpOverflow failureBlock

The failure case is a full message send.

The problem with the above code is that you have 3 branches. That is why I need jump tables, there are cases where cmov really

dosn't help

There is only 3 branches and I'm hoping that they will never be taken so they should be easy to predict. That said the branches do use branch predictor resources which could cause other branches not to be predicted as well.

Yes, I agree. I am really not an expert int this matters, but I think It is not so uncommon to send #+ with other objects than smallints, in that case, may be one of the first 2 branches would be misspredicted. May be you could test that both of them are smallints with just one branch. (I am doing that right now). But may be I will try to do it without branching at all

...

...
Before I started using exupery, I called special methods in C that implemented faster code. Every special method (and primitives) returned

1

...
in case of an error, and if success, returned the result object. One of this special methods was +. This is part of the code:

if(areIntegers(rcvr,arg)) { int result; asm( "movl $1,%%edx\n\t" "movl %[rcvr],%[result]\n\t" "addl %[arg],%[result]\n\t" "cmovol %%edx,%[result]" : [result] "=r" (result) : [rcvr] "r" (rcvr), [arg] "r" (arg) : "edx" ); return result; }

with this code, I've got up to 10% faster code in + intensive tests.

Do you have conditionals inside areIntegers and to check if the result is 1 indicating an error?

As I dont use this code so often as before, (because I inline that with exupery at compile time) I dont't worry about it any more. But areIntegers() is just an "or" and an "and", the branch is represented in the C "if" statement. I wrote the addition that way because I wanted to test if cmov was really that fast. It was better, but not THAT better.

Guille

6202

Age (days ago)

6219

Last active (days ago)

exupery@lists.squeakfoundation.org

14 comments

3 participants

tags (0)

participants (3)

bryce＠kampjes.demon.co.uk
Guillermo Adrián Molina
Sebastian Sastre