Adding loop primitives/optimizations

Lyndon Tremblay humasect at shaw.ca
Fri Dec 3 01:31:15 UTC 2004


----- Original Message ----- 
From: "Marcus Denker" <denker at iam.unibe.ch>
To: "The general-purpose Squeak developers list"
<squeak-dev at lists.squeakfoundation.org>
Sent: Thursday, December 02, 2004 12:30 PM
Subject: Re: Adding loop primitives/optimizations


>
> Am 02.12.2004 um 20:34 schrieb Lyndon Tremblay:
>
> >>
> >> VI4 was an experiment of Anthony hannan, the result is the
> >> closurecompiler on
> >> SqueakMap. (the changes to bytecodes, stack layout... have been
> >> abandoned).
> >>
> >> The status for the Jitter is unkown.
> >>
> >>      Marcus
> >>
> >>
> >
> > VI4 is documented to also meant to be an image format change - this
> > means
> > either these changes are already in and the closure compiler is
> > optional, or
> > the result of the experiment was just the closure compiler itself. Am I
> > close?
>
> The result is just the closure compiler. The changes were much more then
> just what is minimally required for the gettung closures. And they had
> some
> bad consequenses for Jit compiling.
>

Ahh, I see. Hmm.

> >  I find if (useJit) and #ifdef JITTER in I believe both the Mac VM and
> > Unix VM sources. (not in Win32)
>
> Yes, these were for J3. J3 actualy got quite far: It worked on both G3
> and x86, both
> MacOS9 and Linux (and I think Andreas made a Win version for testing,
> too).
>
> The Jitter was done by Ian Piurmarta, originally for Linux/G3. With
> lots of help
> from Ian, I did the port to x86 and MacOS.

I won't assume that code is currently enabled by default in official
platform VM builds, but, I will actually look into the Win32 version when I
can get it to compile to a VM that is as fast as the provided one. (I will
not install gcc-2.95 hopefully, though). My (hacked, and 'struct foo'-less I
believe) builds with cygwin are much slower.

>
> Of course, that was 2001. Revisiting the benchmarks is kind of
> interesting...
>
> Interp:     '43805612 bytecodes/sec; 1325959 sends/sec'
> J3:         '135665076 bytecodes/sec; 8100691 sends/sec'
>
> Today: (PowerBookG4 1.5GHz), interp:
>
>                '114387846 bytecodes/sec; 5152891 sends/sec'
>
> But the mircoBenchmarks don't tell the whole story: Even with a speedup
> of factor 6 in sends, we only saw the performance doubled on real world
> benchmarks (e.g. the MacroBenchmarks). So even beeing slower on sends,
> I'd
> guess that my System today is faster then the Jit based one of 2001.
>

These are tested on the same system? I'm not sure I can agree with anyone
hinting that any beneficial change is not worth it. If there was a 0.1%
speedup for the entire system achievable with 20 hours of Smalltalk coding,
I would still do it. (Though, at that small amount, you are optimising the
wrong thing! Or, there is nothing else left to optimise =)


> This is because Squeak was carefully optimized to run primitives most
> of the time.
> And then, as Tim pointed out, even if the Jit can optimize the code to
> run in zero
> seconds, you will only see the perfomance doubled when the system
> spends 50%
> of the time in the primitives.
>
> Another, related problem was GC: with the faster VM, the percentage of
> the time that the system
> spends in gc will grow. I'd guess that we would have to look closely at
> the gc to get more
> leverage from a good Jit.
>
> Of course, even if you don't get quite the performance that you'd like
> to see at the end, it's worthwhile.
> as a good jit allows you to convert a lot of the Slang code to normal
> smalltalk, thus making better designs
> possible and much easier to be changed, as it's not hardcoded in the VM
>
> J5 then was an interesting design: real PICs. And all the complex
> bytecode was de-optimized to a simple
> form were every send was really happening. Even without *any* inlining,
> Ian managed to get near the performance
> of the interpreter. And with the PIC data, the next step would be to
> start using that to do optimizations based
> on the types that are recorded in the PICs... don't know if that ever
> got implemented in J5.
>
> Speaking about runtime compilers for Squeak, there are two other
> projects: Exupery by Bruce Kampjes, a
> runtime translator that is written in Squeak, not C++. This is on
> SqueakMap. Bruce has reported
> some good speedup already.
>
> And then there is AOStA, a project started by Elliot Miranda. The idea
> here is to add TypeFeedback optimization
> to an existing Jit-based Smalltalk (e.g. VisualWorks or J5 without
> inlining) using a Bytecode-2-Bytecode optimizer
> in the image (and a slightly modified vm with a bunch of additional
> bytecodes and access to PIC data).
> This project was extremely successfull in the sense that I got my Dipl.
> Inform. (Masters Degree) by hacking on it,
> but it has not yet resulted in anything practically useful (and it has
> therefore not yet proven that this is a working
> design at all).
>
>      Marcus
>
>
>




More information about the Squeak-dev mailing list