[squeak-dev] Cog performance

Thu Jun 24 22:30:20 UTC 2010

On Thu, Jun 24, 2010 at 3:19 PM, Levente Uzonyi <leves at elte.hu> wrote:

> On Tue, 22 Jun 2010, Eliot Miranda wrote:
>
> <snip>
>
>
>  I can't say for sure without profiling (you'll find a good VM profiler
>> QVMProfiler in the image in the tarball, which as yet works on MacOS
>> only).
>>
>
> This looks promising, I (or someone else :)) just have to implement
> #primitiveExecutableModulesAndOffsets under win32 (and un*x), but that
> doesn't seem to be easy (at least the win32 part).

If you look at platforms/win32/vm/sqWin32Backtrace.c you'll find code that
extracts symbols from dlls for constructing a symbolic backtrace on crashes.
 The code also uses a Teleplace.map file generated by the VM makefile which
contains the symbols for the VM.  From this code you ought to be able to be
able to implement a QVMProfilerWin32SymbolsManager almost entirely out of
primitives.

But I expect that the reason is the cost of invoking interpreter primitives
>> from machine code.  Cog only implements a few primitives in machine code
>> (arithmetic, at: & block value) and for all others (e.g. nextPut: above)
>> it
>> executes the interpreter primitives.  lcsFor:and: uses at:put: heavily and
>> Cog is using the interpreter version.  But the cost of invoking an
>> interpreter primitive from machine code is higher than invoking it from
>> the
>> interpreter because of the system-call-like glue between the machine-code
>> stack pages and the C stack on which the interpreter primitive runs.
>>
>> Three primitives that are currently interpreter primitives but must be
>> implemented in machine code for better performance are new/basicNew,
>> new:/basicNew: and at:put:.  I've avoided implementing these in machine
>> code
>> because the object representation is so complex and am instead about to
>> start work on a simpler object representation.  When I have that I'll
>> implement these primitives and then the speed difference should tilt the
>> other way.
>>
>
> This sounds reasonable. #lcsFor:and: uses #at:put: twice in the inner loop.
> One of them (lcss at: max + k + 1 put: lcs) can be eliminated without
> affecting the computation, because that just stores the results. So without
> only one #at:put: it took me 2423ms to run the benchmark. Which is still a
> bit too high. I think only the profiler can help here.
>
> Btw, is MessageTally less accurate with CogVM than with the SqueakVM?
>

I'm not sure.  We use a variant written by Andreas that is more accurate
than MessageTally but that may use different plumbing.

best
Eliot

>
> Levente
>
>
>
>> Of course if anyone would like to implement these in the context of the
>> current object representation be my guest and report back asap...
>>
>> best
>> Eliot
>>
>>
>>>
>>> Levente
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100624/f6f70412/attachment.htm