On Thu, Jun 24, 2010 at 3:19 PM, Levente Uzonyi <leves@elte.hu> wrote:

On Tue, 22 Jun 2010, Eliot Miranda wrote:

<snip>

I can't say for sure without profiling (you'll find a good VM profiler
QVMProfiler in the image in the tarball, which as yet works on MacOS only).

This looks promising, I (or someone else :)) just have to implement #primitiveExecutableModulesAndOffsets under win32 (and un*x), but that doesn't seem to be easy (at least the win32 part).

If you look at platforms/win32/vm/sqWin32Backtrace.c you'll find code that extracts symbols from dlls for constructing a symbolic backtrace on crashes. The code also uses a Teleplace.map file generated by the VM makefile which contains the symbols for the VM. From this code you ought to be able to be able to implement a QVMProfilerWin32SymbolsManager almost entirely out of primitives.

But I expect that the reason is the cost of invoking interpreter primitives
from machine code. Cog only implements a few primitives in machine code
(arithmetic, at: & block value) and for all others (e.g. nextPut: above) it
executes the interpreter primitives. lcsFor:and: uses at:put: heavily and
Cog is using the interpreter version. But the cost of invoking an
interpreter primitive from machine code is higher than invoking it from the
interpreter because of the system-call-like glue between the machine-code
stack pages and the C stack on which the interpreter primitive runs.

Three primitives that are currently interpreter primitives but must be
implemented in machine code for better performance are new/basicNew,
new:/basicNew: and at:put:. I've avoided implementing these in machine code
because the object representation is so complex and am instead about to
start work on a simpler object representation. When I have that I'll
implement these primitives and then the speed difference should tilt the
other way.

This sounds reasonable. #lcsFor:and: uses #at:put: twice in the inner loop. One of them (lcss at: max + k + 1 put: lcs) can be eliminated without affecting the computation, because that just stores the results. So without only one #at:put: it took me 2423ms to run the benchmark. Which is still a bit too high. I think only the profiler can help here.

Btw, is MessageTally less accurate with CogVM than with the SqueakVM?

I'm not sure. We use a variant written by Andreas that is more accurate than MessageTally but that may use different plumbing.

best

Eliot

Levente

Of course if anyone would like to implement these in the context of the
current object representation be my guest and report back asap...

best
Eliot

Levente