[Vm-dev] [SqueakJS] Faster JIT ideas

Vanessa Freudenberg vanessa at codefrau.net
Tue Mar 16 08:38:33 UTC 2021


On Mon, Mar 15, 2021 at 7:57 PM Florin Mateoc <florin.mateoc at gmail.com>
wrote:

Perhaps better expressed: since benchFiby is a generator function, the call
> benchFiby() does not execute its contents (so it cannot reach any potential
> yields inside it), it just returns a generator, so we have to make all the
> calls generator calls


Ah, that's a JS thing I had not understood yet, that all calls to generator
functions need to be made using yield*, and you can only use yield* inside
of a generator function.


> Well, who makes the generator call inside GlobalCheckForInterrupts if the
> method hosting the GlobalCheckForInterrupts call was not called via yield*?
>

I think that would be the trampoline function that goes from interpreted
code to compiled code. That same function needs to handle grabbing function
args off the context's stack, passing them as arguments to the jitted
method, and once that returns, popping the stack + pushing the result.

yield* needs to traverse the whole execution stack. If you do a top-level
> next(), no inner invocations will occur if they are not generator calls.
>

Okay. I'll have to think about that. My hope was that all the regular
execution can happen using regular function invocations and only process
switching would use yield. But you're right, we need to use yield* all the
way down. Hope that's not too expensive ... or maybe unwinding to process
switch is the better tradeoff after all.

Sure, you can print the stack, but have you checked that you can switch and
> then come back and resume an interrupted one?
>

Not yet, I'm still designing it and trying things.


> Anyway, thank you for the good news with Chrome. I will also check with
> Node
>

Meanwhile, I made a little scheduler just to learn how to use yield*. Works
nicely:
https://codepen.io/codefrau/pen/jOVRmxm

Vanessa


> On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <vanessa at codefrau.net>
> wrote:


>> Thanks for that Florin, very helpful!
>>
>> I'm curious why you need to make every send into a generator call, and
>> not just rely on the one in your GlobalCheckForInterrupts()?
>>
>> My implementation is intended to allow context switching that way:
>>
>> if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;
>>
>>
>> This is the same technique as in other VMs, where the actual check for
>> context switch is not done for every send, but only when
>> the interruptCheckCounter goes below zero. In the first mockup on codepen,
>> vm.handleDepthAndInterrupts prints the contents of all the contexts,
>> proving that the information is accessible in case we need to context
>> switch or reify the stack. My hope was that inside of
>> handleDepthAndInterrupts() I could use *yield to do the actual context
>> switching.
>>
>> My second, exception based mockup on codepen uses the same approach,
>> except that when a context switch is needed it would throw, unwinding the
>> stack fully, and creating actual context objects along the way. I have not
>> mocked up that part yet (because it would need a mockup interpreter too, to
>> continue after unwind), but you can see the unwind working correctly by
>> uncommenting the line with
>> // throw Error("unwind")
>>
>> I tried your benchFib vs benchFiby on Chrome, which seems to
>> optimize generators a lot better than Firefox. On Chrome the overhead is
>> just 50% or so, vs 300% on Firefox. Safari appears to be between the two.
>>
>> I will need to make more complete mockups before deciding on a design.
>> Especially how closures would be handled. Does anyone have a tiny
>> one-method benchmark that would highlight closure performance?
>>
>> Vanessa
>>
>> On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <florin.mateoc at gmail.com>
>> wrote:
>>
>>>
>>> I don't think the numbers could be meaningfully compared. The whole
>>> purpose of the yield* invocations is to enable process switching, which I
>>> don't think your jitted methods allow for. But then are you merely
>>> comparing yield* invocations against direct invocations? For sure, the
>>> yield* ones will be  much slower.
>>> Let alone my implementation, which also uses yield* for #< , #- and #+,
>>> so it would be much slower than below as operators are surely well
>>> optimized by JS, we can just measure direct vs yield* for the main
>>> recursive invocation:
>>>
>>> Number.prototype.benchFib = function benchFib() {
>>>   return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
>>> }
>>>
>>> var t = performance.now();
>>> (30).benchFib();
>>> performance.now() - t
>>>
>>> gives on my laptop in Firefox 920, 911, 919, 898
>>>
>>> Versus
>>>
>>> Number.prototype.benchFiby = function* benchFiby() {
>>>    return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this
>>> - 2).benchFiby()) + 1
>>> }
>>>
>>> var t = performance.now();
>>> (30).benchFiby().next();
>>> performance.now() - t
>>>
>>> gives 2998, 3125, 3116, 3140
>>>
>>>
>>> On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <
>>> vanessa at codefrau.net> wrote:
>>>
>>>>
>>>> Hi Florin,
>>>>
>>>> wow, that looks exciting!
>>>>
>>>> This is indeed a much more thorough Squeak-to-JS mapping, where mine is
>>>> more of a "traditional" VM. I love it!
>>>>
>>>> Since my original post I implemented a mockup of "my" new JIT scheme:
>>>>
>>>>
>>>> https://squeak.js.org/docs/jit.md.html#sketch:contextproxieswithintrospectionandinlinecaching/newjitsketch/performanceestimate
>>>> [image: image.png]
>>>> You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw
>>>>
>>>> Could you share the performance numbers you are seeing for your
>>>> benchFib, in comparison to SqueakJS or my mockup? I am curious if
>>>> yield* is the way to go.
>>>>
>>>> Thanks for sharing! And congrats on your new job. My progress is slow
>>>> too, I only work on it some weekends. But then, I'm not in a hurry :)
>>>>
>>>> Vanessa
>>>>
>>>> On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <florin.mateoc at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Vanessa,
>>>>>
>>>>> Sorry for the delay in responding - as somebody who has been inspired
>>>>> by your SqueakJS project, I think I should mention that I am working on a
>>>>> related project, for now tentatively called JsSqueak.
>>>>> In addition to the inspiration provided by SqueakJS, it also scratches
>>>>> my longstanding itch about compiling (transpiling) Squeak.
>>>>>
>>>>> I hesitated to talk about it, as it is still a work in progress -
>>>>> after small bits and pieces that I worked on over a long period, I had the
>>>>> opportunity to spend a significant and uninterrupted chunk of time on it
>>>>> last summer, when I was unemployed for 3 months, and I was able to make
>>>>> good progress. I was optimistically thinking of releasing a first version
>>>>> before the end of last year, but after I started working on my new job,
>>>>> progress on JsSqueak has slowed down significantly. I must confess that I
>>>>> (and especially my wife) hesitate in recreating that productive unemployed
>>>>> situation :)
>>>>>
>>>>> I started with Squeak 4.5 - I already had code transforming Smalltalk
>>>>> code to a form more suitable for translation - and I also started with
>>>>> VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to
>>>>> heavily modify it, since I have to get rid of the stack usage for
>>>>> arguments/receiver and returns.
>>>>> Both of these big parts are working. I also implemented most numbered
>>>>> primitives by hand - they are inlined at generation time in the methods
>>>>> calling them.
>>>>> I am also taking advantage of the latest and greatest additions to
>>>>> JavaScript. I am, of course, using classes, and the parallel class-side
>>>>> hierarchy is implemented using statics. To implement green threads/process
>>>>> switching, all translated methods are implemented as generator functions,
>>>>> and all calls are through yield* expressions. The preemption/interrupt
>>>>> check points are inlined. With this, a process switch is achieved by simply
>>>>> yield-ing (in the process/semaphore primitives).
>>>>> With this, the Integer>>#benchFib method is translated (as a method in
>>>>> Number.prototype, there is one more, simpler, implementation in BigInt) as:
>>>>>
>>>>> *_benchFib() {
>>>>>    if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
>>>>>       /*Handy send-heavy benchmark*/
>>>>>    /*(result // seconds to run) = approx calls per second*/
>>>>>    /* | r t |
>>>>>      t := Time millisecondsToRun: [r := 26 benchFib].
>>>>>      (r // 1000) // t*/
>>>>>    /*138000 on a Mac 8100//100*/
>>>>>    if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();
>>>>>
>>>>>    return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
>>>>> } else // No implementation for #benchFib in Float hierarchy, trigger a DNU
>>>>>       return yield* super._benchFib()
>>>>> }
>>>>>
>>>>> The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
>>>>>
>>>>> The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
>>>>>
>>>>> Of course, in Boolean, booleanValueOf is just an alias for valueOf
>>>>>
>>>>> As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized,
>>>>> although not completely, since their result can morph into LargeInteger (mapped to BigInt).
>>>>>
>>>>> As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
>>>>>
>>>>> BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef.
>>>>> I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
>>>>>
>>>>> The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
>>>>>
>>>>> Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
>>>>>
>>>>> The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.
>>>>>
>>>>> One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize
>>>>> and hook up the display.
>>>>>
>>>>> Florin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <
>>>>> vanessa at codefrau.net> wrote:
>>>>>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> ideas for a faster SqueakJS JIT have been swirling around in my head
>>>>>> for many years. This weekend I took the time to write some of them down:
>>>>>>
>>>>>> https://squeak.js.org/docs/jit.md.html
>>>>>>
>>>>>> Feedback welcome!
>>>>>>
>>>>>> Vanessa
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210316/b5475096/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 42124 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210316/b5475096/attachment-0001.png>


More information about the Vm-dev mailing list