[Vm-dev] Strongtalk and Exupery

Sun Sep 24 00:18:49 UTC 2006

Hi Bryce,

> -----Original Message-----
> From: vm-dev-bounces at lists.squeakfoundation.org
>
> Hi David,
> The bytecode benchmark is a prime number sieve. It uses #at: and
> #at:put:. The send benchmark is a simple recursive Fibonacci function.
> Both are just measures of how quickly they execute, neither really
> measures the actual bytecodes or sends performed. They are the old
> tinyBenchmarks. I'd guess everyone ran the same code for these
> benchmarks.

That's fine, it's just that we need to actually run these benchmarks right-
with different architectures, clock speeds etc. I don't think we know the
relative performance yet.

> I 100% agree that inlining is the right way to optimise common sends
> and block execution. [...]

Ok, I was just trying to say that in Smalltalk, a mediocre compiler with
optimistic inlining is better than a great compiler without inlining.  As
long as you are headed in the direction of optimistic inlining, we are in
agreement.

I just want to re-emphasize the importance of "optimistic", which implies
the ability to deoptimize, not just the ability to inline.  Inlining the
common case non-optimistically (i.e. with an 'else' clause containing the
non-common case) is not nearly as good, since after those two cases merge
you can't assume anything, whereas with optimism the rest of the code can
assume the common case was taken, providing much more information for
optimization (e.g. if the common case returns a SmallInteger, that is known
in subsequent code, whereas without deoptimization, the subsequent code
can't assume anything about the return value, regardless of inlining).
Sorry if you already understood this, I couldn't tell from your post.

The reason I am pointing this out is that the machinery for deoptimization
is the hard part.  That is really the big advantage of the Strongtalk VM-
that it provides all that infrastructure.  I just want to make sure you are
taking that into consideration.

> I'd also not be surprised if Strongtalk is faster than Exupery for
> bytecode performance. I'm guessing that Strongtalk's integer
> arithmetic and #at: performance are better. Squeak uses 1 for it's
> integer tag so in general it takes 3 instructions to detag then retag
> and 2 clocks latency (this can be optimised often be optimised to 1
> instruction and 1 clock latency). I'm guessing Strongtalk uses 0 for
> it's integer tag.

Yes.

> Squeak uses a remembered set for it's write barrier which requires
> checking if the object is in the remembered set, and checking if the
> object is in new-space before adding it. Strongtalk might be using a
> card marking table just requiring a single store.

Yes, Strongtalk uses card marking; I think it is two instructions.  It is
Urs Holzle's write barrier, so it is probably the same as in Self.

> Squeak stores the size of an object in one of two places. So to get
> the size to range check you first need to figure out where it's
> stored. I'm guessing that the size for an array is stored at a fixed
> location in Strongtalk.

Yes.

> My assumptions about Strongtalk's object memory are based on reading
> the papers from the Self project.
>
> None of these things really matters to Squeak while it's running as an
> interpreter because most of the time is spent recovering from branch
> mispredicts or waiting for memory leaving plenty of time available to
> hide the inefficiencies above.
>
>
> One way to get around a slow compiler would be to save the code cache
> beside the image. All relocation is done in Smalltalk, so doing this
> shouldn't be too hard. But figuring out how get around a slow compiler
> can wait until after the compiler has become useful. There are several
> answers including writing a faster register allocator (2) or being the
> third compiler.

Yes, I have always wanted to be able to save the code.  We only have the
inlining DB right now, which doesn't avoid the compilation overhead on each
run.

-Dave