Thinking about Exupery 0.14

Tue Dec 18 21:23:50 UTC 2007

On 18/12/2007, bryce at kampjes.demon.co.uk <bryce at kampjes.demon.co.uk> wrote:
> Igor Stasenko writes:
>  > On 17/12/2007, bryce at kampjes.demon.co.uk <bryce at kampjes.demon.co.uk> wrote:
>  > >   arithmaticLoopBenchmark 1396 compiled  128 ratio: 10.906
>  > >   bytecodeBenchmark       2111 compiled  460 ratio:  4.589
>  > >   sendBenchmark           1637 compiled  668 ratio:  2.451
>  > >   doLoopsBenchmark        1081 compiled  715 ratio:  1.512
>  > >   pointCreation           1245 compiled 1317 ratio:  0.945
>  > >   largeExplorers           728 compiled  715 ratio:  1.018
>  > >   compilerBenchmark        483 compiled  489 ratio:  0.988
>  > >   Cumulative Time         1125 compiled  537 ratio   2.093
>  > >
>  > >   ExuperyBenchmarks>>arithmeticLoop                249ms
>  > >   SmallInteger>>benchmark                         1112ms
>  > >   InstructionStream>>interpretExtension:in:for: 113460ms
>  > >   Average                                         3155.360
>
> First, from the numbers above, I'd say that having a method that takes
> 2 minutes to compile is currently the biggest practical problem. The
> second set of numbers is a compilation time benchmark. The second
> biggest problem is that a 2.4 times increase in send speed is not
> transferring through to the two macro-benchmarks (largeExplorers and
> compilerBenchmark).
>

I suspect that main bottleneck in largeExplorers is not
compiled/bytecode code, but memory allocations and GC.
So, i doubt that you can gain any performance increase here.

>  >
>  > Do you make any difference between calling compiling method and , for
>  > instance, a primitive function?
>
> The sender doesn't know if it's sending to a primitive or to a full
> method. If Exupery compiles a primitive then it executes in the
> senders context, just like the interpreter,
>
>  > As i remember, you compiling methods to some form of a routine, which
>  > can be called using cdecl convention.
>  > But on top of that, knowing the fact that you calling a compiled
>  > method you can use some register optimizations like passing arguments
>  > in it, and in general by knowing where you changing registers, you can
>  > predict what of them are changing after call, and what will stay
>  > unchanged.
>  > And, of course, nothing stops you from using own calling convention to
>  > make code working faster. There's also a MMX/SSE registers which can
>  > be used for different purposes.
>  > All of the above, depending on choices, can greatly improve sends speed.
>  > Just want to know, what you thinking about it.
>
> Currently Exupery uses C's calling conventions combined with the
> interpreters handling of contexts, there's plenty of room to improve
> this but I doubt that raw send speed is why the macro benchmarks
> aren't performing.
>
> Also full method inlining will change the value of other send
> optimisations by removing most of the common sends. It's the best
> optimisation for common sends. 1.0 is a base to add full method
> inlining too.
>
>  > And small trick when compiling SmallInteger methods: you already know
>  > that receiver is a smallinteger. So, by using that knowledge, some
>  > tests can be omitted.
>  > In same manner you can deal with compiling methods for classes which
>  > have byte/reference indexed instances.
>
> Exupery compiles a method for each receiver so this is possible but
> not done yet. It'll get even more interesting when combined with full
> method inlining, then common self sends will become completely free.
>
Yes, and for such matter i started coding some classes to translate ST
method source to set of lambda-sends.
Lambdas are very simple and yet powerful way to represent abstract algorithm.
By having only few rules - substitution and reduction, i can transform
method code to any form.
And it's very helpful in case of full method inlining.
Method can be represented as a labmda having single free variable <context>:

lambda method(context),

where any contextual parts/actions represented as messages to context, like:

receiver: Lambda context receiver.
receiver class: Lambda context receiver class.
argument1 : Lambda context argumentAt: 1
push: Lambda context push: arg
return: Lambda context return: expression.
e.t.c.

But now, when you going to inline method, things become interesting,
because, if you know the receiver's class, then you can reduce some
labmdas at compile stage, like any accesses to receiver's class,
results of receiver method lookups e.t.c.

And even more, if you have a way how to determine if method(s) are
referentially transparent, then you can reduce some of methods to
returned results at compile time.

Like having:

isInteger
  ^ true

and then, in some method, when you encounter
  self isInteger ifTrue: [ codeA ] ifFalse: [ codeB ]

you can reduce the whole expression to codeA, or codeB , depending on
class of receiver.
So, in given example, knowing the class of receiver, you can eliminate
two sends: #isInteger and #ifTrue:ifFalse:.
I think, that making compiler be able to detect and do such reductions
will render any other kinds of optimizations much less important.

As i know, you are translating methods by taking their bytecode.
I'm not sure, if the above is possible by compiling from bytecode. I
think it would be much easier to operate with abstract parse trees
(with lambda-sends as nodes).

> Bryce
> _______________________________________________
> Exupery mailing list
> Exupery at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>

-- 
Best regards,
Igor Stasenko AKA sig.