[Vm-dev] Cog status & FFI directions [was rearchitecting the FFI
implementation for reentrancy]
Igor Stasenko
siguctua at gmail.com
Fri Aug 7 13:04:47 UTC 2009
2009/8/7 Andreas Raab <andreas.raab at gmx.de>:
>
> Eliot Miranda wrote:
>>
>> The first incarnation of the Cog JIT is complete (for x86 only) and in use
>> at Qwaq. We are gearing up for a new server release and the Cog VM is the
>> Vm beneath it. The next client release will include it also. This VM has a
>> naive code generator (every push or pop in the bytecode results in a push or
>> pop in machine code) but good inline cacheing. Performance is as high as 5x
>> the current interpreter for certain computer-language-shootout benchmarks.
>> The naive code generator means there is poor loop performance (1 to: n do:
>> ... style code can be 4 times slower than VisualWorks) and the object model
>> means there is no machine code instance creation and no machine code at:put:
>> primitive. But send performance is good and block activation almost as fast
>> as VisualWorks. In our real-world experience we were last week able to run
>> almost three times as many Qwaq Forums clients against a QF server running
>> on the Cog VM than we were able to above the interpreters. So the Cog JIT
>> is providing significant speedups in real-world use.
>
> Indeed. Here some numbers that I took earlier this year:
>
> VM version bc/sec sends/sec Macro1 Macro2 Macro5 Total
> Closure(3.11.2) 198,295,894 5,801,773 3124ms 79333ms 9935ms 92411ms
> Stack (2.0.10) 178,521,617 8,141,165 2136ms 43081ms 6874ms 52117ms
it was always confusing to me, how it is possible to have higher send
rate & lower bytecode execution rate at the same time.
The way how tinybenchmark calculating it is tricky one.
> Cog (current) 199,221,789 17,509,420 982ms 29392ms 4053ms 34445ms
> Stack vs. Closure 0.9 1.4 1.46 1.84 1.45 1.77
> Cog vs. Stack 1.12 2.16 2.17 1.46 1.69 1.51
> Cog vs. Closure 1.0 3.0 3.18 2.7 2.45 2.68
>
> As a total improvement in performance Cog ranks at approx. 2.7x faster in
> macro benchmarks than what we started from. That's a pretty decent bit of
> speedup for real-world applications.
>
> Compare this (for example) with j3 [1] which despite a speedup of 6x in
> microbenchmarks only provided a 2x speedup in the macros.
>
> [1] http://aspn.activestate.com/ASPN/Mail/Message/squeak-list/2369033:
>
> "Of course, that was 2001. Revisiting the benchmarks is kind of
> interesting...
>
> Interp: '43805612 bytecodes/sec; 1325959 sends/sec'
> J3: '135665076 bytecodes/sec; 8100691 sends/sec'
>
> Today: (PowerBookG4 1.5GHz), interp:
>
> '114387846 bytecodes/sec; 5152891 sends/sec'
>
> But the mircoBenchmarks don't tell the whole story: Even with a speedup
> of factor 6 in sends, we only saw the performance doubled on real world
> benchmarks (e.g. the MacroBenchmarks)."
>
>
> Cheers,
> - Andreas
>
--
Best regards,
Igor Stasenko AKA sig.
More information about the Vm-dev
mailing list