[Vm-dev] [SqueakJS] Faster JIT ideas

Vanessa Freudenberg vanessa at codefrau.net
Tue Mar 16 02:28:01 UTC 2021

Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not
just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 &&
vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for
context switch is not done for every send, but only when
the interruptCheckCounter goes below zero. In the first mockup on codepen,
vm.handleDepthAndInterrupts prints the contents of all the contexts,
proving that the information is accessible in case we need to context
switch or reify the stack. My hope was that inside of
handleDepthAndInterrupts() I could use *yield to do the actual context

My second, exception based mockup on codepen uses the same approach, except
that when a context switch is needed it would throw, unwinding the stack
fully, and creating actual context objects along the way. I have not mocked
up that part yet (because it would need a mockup interpreter too, to
continue after unwind), but you can see the unwind working correctly by
uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to
optimize generators a lot better than Firefox. On Chrome the overhead is
just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design.
Especially how closures would be handled. Does anyone have a tiny
one-method benchmark that would highlight closure performance?


On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <florin.mateoc at gmail.com>

> I don't think the numbers could be meaningfully compared. The whole
> purpose of the yield* invocations is to enable process switching, which I
> don't think your jitted methods allow for. But then are you merely
> comparing yield* invocations against direct invocations? For sure, the
> yield* ones will be  much slower.
> Let alone my implementation, which also uses yield* for #< , #- and #+, so
> it would be much slower than below as operators are surely well optimized
> by JS, we can just measure direct vs yield* for the main recursive
> invocation:
> Number.prototype.benchFib = function benchFib() {
>   return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
> }
> var t = performance.now();
> (30).benchFib();
> performance.now() - t
> gives on my laptop in Firefox 920, 911, 919, 898
> Versus
> Number.prototype.benchFiby = function* benchFiby() {
>    return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this -
> 2).benchFiby()) + 1
> }
> var t = performance.now();
> (30).benchFiby().next();
> performance.now() - t
> gives 2998, 3125, 3116, 3140
> On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <vanessa at codefrau.net>
> wrote:
>> Hi Florin,
>> wow, that looks exciting!
>> This is indeed a much more thorough Squeak-to-JS mapping, where mine is
>> more of a "traditional" VM. I love it!
>> Since my original post I implemented a mockup of "my" new JIT scheme:
>> https://squeak.js.org/docs/jit.md.html#sketch:contextproxieswithintrospectionandinlinecaching/newjitsketch/performanceestimate
>> [image: image.png]
>> You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw
>> Could you share the performance numbers you are seeing for your benchFib,
>> in comparison to SqueakJS or my mockup? I am curious if yield* is the
>> way to go.
>> Thanks for sharing! And congrats on your new job. My progress is slow
>> too, I only work on it some weekends. But then, I'm not in a hurry :)
>> Vanessa
>> On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <florin.mateoc at gmail.com>
>> wrote:
>>> Hi Vanessa,
>>> Sorry for the delay in responding - as somebody who has been inspired by
>>> your SqueakJS project, I think I should mention that I am working on a
>>> related project, for now tentatively called JsSqueak.
>>> In addition to the inspiration provided by SqueakJS, it also scratches
>>> my longstanding itch about compiling (transpiling) Squeak.
>>> I hesitated to talk about it, as it is still a work in progress - after
>>> small bits and pieces that I worked on over a long period, I had the
>>> opportunity to spend a significant and uninterrupted chunk of time on it
>>> last summer, when I was unemployed for 3 months, and I was able to make
>>> good progress. I was optimistically thinking of releasing a first version
>>> before the end of last year, but after I started working on my new job,
>>> progress on JsSqueak has slowed down significantly. I must confess that I
>>> (and especially my wife) hesitate in recreating that productive unemployed
>>> situation :)
>>> I started with Squeak 4.5 - I already had code transforming Smalltalk
>>> code to a form more suitable for translation - and I also started with
>>> VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to
>>> heavily modify it, since I have to get rid of the stack usage for
>>> arguments/receiver and returns.
>>> Both of these big parts are working. I also implemented most numbered
>>> primitives by hand - they are inlined at generation time in the methods
>>> calling them.
>>> I am also taking advantage of the latest and greatest additions to
>>> JavaScript. I am, of course, using classes, and the parallel class-side
>>> hierarchy is implemented using statics. To implement green threads/process
>>> switching, all translated methods are implemented as generator functions,
>>> and all calls are through yield* expressions. The preemption/interrupt
>>> check points are inlined. With this, a process switch is achieved by simply
>>> yield-ing (in the process/semaphore primitives).
>>> With this, the Integer>>#benchFib method is translated (as a method in
>>> Number.prototype, there is one more, simpler, implementation in BigInt) as:
>>> *_benchFib() {
>>>    if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
>>>       /*Handy send-heavy benchmark*/
>>>    /*(result // seconds to run) = approx calls per second*/
>>>    /* | r t |
>>>      t := Time millisecondsToRun: [r := 26 benchFib].
>>>      (r // 1000) // t*/
>>>    /*138000 on a Mac 8100//100*/
>>>    if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();
>>>    return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
>>> } else // No implementation for #benchFib in Float hierarchy, trigger a DNU
>>>       return yield* super._benchFib()
>>> }
>>> The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
>>> The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
>>> Of course, in Boolean, booleanValueOf is just an alias for valueOf
>>> As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized,
>>> although not completely, since their result can morph into LargeInteger (mapped to BigInt).
>>> As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
>>> BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef.
>>> I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
>>> The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
>>> Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
>>> The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.
>>> One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize
>>> and hook up the display.
>>> Florin
>>> On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <
>>> vanessa at codefrau.net> wrote:
>>>> Hi all,
>>>> ideas for a faster SqueakJS JIT have been swirling around in my head
>>>> for many years. This weekend I took the time to write some of them down:
>>>> https://squeak.js.org/docs/jit.md.html
>>>> Feedback welcome!
>>>> Vanessa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210315/eae7c912/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 42124 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210315/eae7c912/attachment-0001.png>

More information about the Vm-dev mailing list