[squeak-dev] Cog performance

Eliot Miranda eliot.miranda at gmail.com
Tue Jun 22 22:12:52 UTC 2010


Hi Levente,

On Tue, Jun 22, 2010 at 2:28 PM, Levente Uzonyi <leves at elte.hu> wrote:

> Hi,
>
> I was curious how much speedup Cog gives when the code has only a few
> message sends, so I ran the following "benchmark":
>
> | s1 s2 |
> Smalltalk garbageCollect.
> s1 := String streamContents: [ :stream |
>        1000 timesRepeat: [
>                'aab' do: [ :e | stream nextPut: e; cr ] ] ].
> s2 := String streamContents: [ :stream |
>        1000 timesRepeat: [
>                'abb' do: [ :e | stream nextPut: e; cr ] ] ].
> [ TextDiffBuilder from: s1 to: s2 ] timeToRun.
>
> The above pattern makes TextDiffBuilder >> #lcsFor:and: run for a while. My
> results are a bit surprising:
> CogVM: 2914
> SqueakVM: 1900
>
> MessageTally shows that (I wonder if it's accurate with Cog at all) CogVM's
> garbage collector is a bit better, but it runs the code slower than
> SqueakVM:
>
> CogVM:
> **Leaves**
> 60.6% {1886ms} TextDiffBuilder>>lcsFor:and:
> 36.2% {1127ms} DiffElement>>=
> 1.8% {56ms} ByteString(String)>>=
>
> **GCs**
>        full                    1 totalling 153ms (5.0% uptime), avg 153.0ms
>        incr            21 totalling 76ms (2.0% uptime), avg 4.0ms
>        tenures         13 (avg 1 GCs/tenure)
>        root table      0 overflows
>
> SqueakVM:
> **Leaves**
> 46.8% {888ms} TextDiffBuilder>>lcsFor:and:
> 35.3% {670ms} DiffElement>>=
> 9.8% {186ms} ByteString(String)>>compare:with:collated:
> 6.9% {131ms} ByteString(String)>>=
>
> **GCs**
>        full                    3 totalling 254ms (13.0% uptime), avg 85.0ms
>        incr            301 totalling 110ms (6.0% uptime), avg 0.0ms
>        tenures         272 (avg 1 GCs/tenure)
>        root table      0 overflows
>
> Is Cog slower because #to:do: loops are not optimized, or is there some
> other reason for the slowdown?
>

I can't say for sure without profiling (you'll find a good VM profiler
QVMProfiler in the image in the tarball, which as yet works on MacOS only).
But I expect that the reason is the cost of invoking interpreter primitives
from machine code.  Cog only implements a few primitives in machine code
(arithmetic, at: & block value) and for all others (e.g. nextPut: above) it
executes the interpreter primitives.  lcsFor:and: uses at:put: heavily and
Cog is using the interpreter version.  But the cost of invoking an
interpreter primitive from machine code is higher than invoking it from the
interpreter because of the system-call-like glue between the machine-code
stack pages and the C stack on which the interpreter primitive runs.

Three primitives that are currently interpreter primitives but must be
implemented in machine code for better performance are new/basicNew,
new:/basicNew: and at:put:.  I've avoided implementing these in machine code
because the object representation is so complex and am instead about to
start work on a simpler object representation.  When I have that I'll
implement these primitives and then the speed difference should tilt the
other way.

Of course if anyone would like to implement these in the context of the
current object representation be my guest and report back asap...

best
Eliot

>
>
> Levente
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100622/31589604/attachment.htm


More information about the Squeak-dev mailing list