The primary goal for the next releases will be making the following benchmarks more compelling. I've added a compile time benchmark as there are a few performance bugs in the compiler that should be removed.
arithmaticLoopBenchmark 1396 compiled 128 ratio: 10.906 bytecodeBenchmark 2111 compiled 460 ratio: 4.589 sendBenchmark 1637 compiled 668 ratio: 2.451 doLoopsBenchmark 1081 compiled 715 ratio: 1.512 pointCreation 1245 compiled 1317 ratio: 0.945 largeExplorers 728 compiled 715 ratio: 1.018 compilerBenchmark 483 compiled 489 ratio: 0.988 Cumulative Time 1125 compiled 537 ratio 2.093
ExuperyBenchmarks>>arithmeticLoop 249ms SmallInteger>>benchmark 1112ms InstructionStream>>interpretExtension:in:for: 113460ms Average 3155.360
First, I'll get the register allocator to allocate each section of method separately. After that, I'll probably do some work on further optimising the register allocator but I might work on improving the generated native code.
Register allocating each section separately will both allow for better and faster allocation. It will make it easy to avoid dealing with registers and interference from other sections of the code and will reduce the size of the problem. Colouring register allocation written well should be on average n log n time but the performance bugs will raise that to probably n^2.
It's possible that just allocating each section of the method separately will be enough to bring allocation down to a reasonable time. It should definitely help for the larger methods but is unlikely to do anything for the arithmaticLoop and will only help the bytecode benchmark slightly. Compiling quicker will make it easier to run more extensive tests.
Bryce