[Vm-dev] [SqueakJS] Faster JIT ideas

Florin Mateoc florin.mateoc at gmail.com
Thu Apr 8 02:15:41 UTC 2021


Blush. I did miss the - 1, sorry for the noise.

With regards to yield*, I am sure the engines will continue to improve the
performance - it is relatively new, so I assume there might be some low
hanging fruit.
We'll see. I am trying to concentrate on correctness first, I also have a
lot of room for optimizations

Florin

On Wed, Apr 7, 2021 at 8:46 PM Vanessa Freudenberg <vanessa at codefrau.net>
wrote:

>
> On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <florin.mateoc at gmail.com>
> wrote:
>
>>
>> Small update: I can now see the UI, and even interact with it a little
>> (before it breaks with various bugs :)) - woohoo!
>>
>
> Awesome!
>
> And thanks for the bug report! I think you might have missed that SqueakJS
> does in fact subtract 1 from the frexp() result:
>
> case 53: return this.popNandPushIntIfOK(argCount+1,
> this.frexp_exponent(this.stackFloat(0)) - 1);
>
> I'm almost certain it is correct. I wrote a whole
> blockpost about emulating frexp in JavaScript 7 years ago:
>
> https://croquetweak.blogspot.com/2014/08/deconstructing-floats-frexp-and-ldexp.html
>
> Btw, I determined that yield* is incredibly slow, because (as you were
> saying) every send now needs to return a generator function. So I had to
> scrap my initial plan for the new SqueakJS JIT. I'm going with regular
> function calls now, and reifying the stack via exception unwinding when
> needed, rather than trying to keep multiple call stacks alive. See this
> twitter thread:
> https://twitter.com/codefrau/status/1378455847055941635
>
> Vanessa
>
> On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <florin.mateoc at gmail.com>
> wrote:
>
>>
>> Small update: I can now see the UI, and even interact with it a little
>> (before it breaks with various bugs :)) - woohoo!
>>
>> Anyway, while debugging my various bugs, I found a small bug in the
>> SqueakJS primitive 53 - I do consult them regularly :)
>> Squeak uses the IEEE 754 specification for exponent (significand between
>> 1 and 2), frexp does not (significand between 0.5 and 1), so the result is
>> off by one.
>> It should not matter much, as the significand is computed based on the
>> exponent, but e.g. the code executed when the primitive fails gives a
>> different result from the primitive itself
>>
>> On Wed, Mar 17, 2021 at 5:01 AM Vanessa Freudenberg <vanessa at codefrau.net>
>> wrote:
>>
>>>
>>> On Tue, Mar 16, 2021 at 1:38 AM Vanessa Freudenberg <
>>> vanessa at codefrau.net> wrote:
>>>
>>>> Meanwhile, I made a little scheduler just to learn how to use yield*.
>>>> Works nicely:
>>>> https://codepen.io/codefrau/pen/jOVRmxm
>>>>
>>>
>>> And here is a JIT mockup using that scheduler:
>>> https://codepen.io/codefrau/pen/LYbopBV
>>>
>>> It's executing "5 benchFib", "4 benchFib", and "3 benchFib" in three
>>> separate processes, switching every 3 sends just for testing, and printing
>>> all contexts.
>>>
>>> No benchmarks yet.
>>>
>>> Vanessa
>>>
>>>
>>>> Vanessa
>>>>
>>>>
>>>>> On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <
>>>>> vanessa at codefrau.net> wrote:
>>>>
>>>>
>>>>>> Thanks for that Florin, very helpful!
>>>>>>
>>>>>> I'm curious why you need to make every send into a generator call,
>>>>>> and not just rely on the one in your GlobalCheckForInterrupts()?
>>>>>>
>>>>>> My implementation is intended to allow context switching that way:
>>>>>>
>>>>>> if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;
>>>>>>
>>>>>>
>>>>>> This is the same technique as in other VMs, where the actual check
>>>>>> for context switch is not done for every send, but only when
>>>>>> the interruptCheckCounter goes below zero. In the first mockup on codepen,
>>>>>> vm.handleDepthAndInterrupts prints the contents of all the contexts,
>>>>>> proving that the information is accessible in case we need to context
>>>>>> switch or reify the stack. My hope was that inside of
>>>>>> handleDepthAndInterrupts() I could use *yield to do the actual context
>>>>>> switching.
>>>>>>
>>>>>> My second, exception based mockup on codepen uses the same approach,
>>>>>> except that when a context switch is needed it would throw, unwinding the
>>>>>> stack fully, and creating actual context objects along the way. I have not
>>>>>> mocked up that part yet (because it would need a mockup interpreter too, to
>>>>>> continue after unwind), but you can see the unwind working correctly by
>>>>>> uncommenting the line with
>>>>>> // throw Error("unwind")
>>>>>>
>>>>>> I tried your benchFib vs benchFiby on Chrome, which seems to
>>>>>> optimize generators a lot better than Firefox. On Chrome the overhead is
>>>>>> just 50% or so, vs 300% on Firefox. Safari appears to be between the two.
>>>>>>
>>>>>> I will need to make more complete mockups before deciding on a
>>>>>> design. Especially how closures would be handled. Does anyone have a tiny
>>>>>> one-method benchmark that would highlight closure performance?
>>>>>>
>>>>>> Vanessa
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <
>>>>>> florin.mateoc at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I don't think the numbers could be meaningfully compared. The whole
>>>>>>> purpose of the yield* invocations is to enable process switching, which I
>>>>>>> don't think your jitted methods allow for. But then are you merely
>>>>>>> comparing yield* invocations against direct invocations? For sure, the
>>>>>>> yield* ones will be  much slower.
>>>>>>> Let alone my implementation, which also uses yield* for #< , #- and
>>>>>>> #+, so it would be much slower than below as operators are surely well
>>>>>>> optimized by JS, we can just measure direct vs yield* for the main
>>>>>>> recursive invocation:
>>>>>>>
>>>>>>> Number.prototype.benchFib = function benchFib() {
>>>>>>>   return this < 2 ? 1 : (this - 1).benchFib() + (this -
>>>>>>> 2).benchFib() + 1
>>>>>>> }
>>>>>>>
>>>>>>> var t = performance.now();
>>>>>>> (30).benchFib();
>>>>>>> performance.now() - t
>>>>>>>
>>>>>>> gives on my laptop in Firefox 920, 911, 919, 898
>>>>>>>
>>>>>>> Versus
>>>>>>>
>>>>>>> Number.prototype.benchFiby = function* benchFiby() {
>>>>>>>    return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (
>>>>>>> this - 2).benchFiby()) + 1
>>>>>>> }
>>>>>>>
>>>>>>> var t = performance.now();
>>>>>>> (30).benchFiby().next();
>>>>>>> performance.now() - t
>>>>>>>
>>>>>>> gives 2998, 3125, 3116, 3140
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <
>>>>>>> vanessa at codefrau.net> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Florin,
>>>>>>>>
>>>>>>>> wow, that looks exciting!
>>>>>>>>
>>>>>>>> This is indeed a much more thorough Squeak-to-JS mapping,
>>>>>>>> where mine is more of a "traditional" VM. I love it!
>>>>>>>>
>>>>>>>> Since my original post I implemented a mockup of "my" new JIT
>>>>>>>> scheme:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://squeak.js.org/docs/jit.md.html#sketch:contextproxieswithintrospectionandinlinecaching/newjitsketch/performanceestimate
>>>>>>>> [image: image.png]
>>>>>>>> You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw
>>>>>>>>
>>>>>>>> Could you share the performance numbers you are seeing for your
>>>>>>>> benchFib, in comparison to SqueakJS or my mockup? I am curious if
>>>>>>>> yield* is the way to go.
>>>>>>>>
>>>>>>>> Thanks for sharing! And congrats on your new job. My progress is
>>>>>>>> slow too, I only work on it some weekends. But then, I'm not in a hurry :)
>>>>>>>>
>>>>>>>> Vanessa
>>>>>>>>
>>>>>>>> On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <
>>>>>>>> florin.mateoc at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Vanessa,
>>>>>>>>>
>>>>>>>>> Sorry for the delay in responding - as somebody who has been
>>>>>>>>> inspired by your SqueakJS project, I think I should mention that I am
>>>>>>>>> working on a related project, for now tentatively called JsSqueak.
>>>>>>>>> In addition to the inspiration provided by SqueakJS, it also
>>>>>>>>> scratches my longstanding itch about compiling (transpiling) Squeak.
>>>>>>>>>
>>>>>>>>> I hesitated to talk about it, as it is still a work in progress -
>>>>>>>>> after small bits and pieces that I worked on over a long period, I had the
>>>>>>>>> opportunity to spend a significant and uninterrupted chunk of time on it
>>>>>>>>> last summer, when I was unemployed for 3 months, and I was able to make
>>>>>>>>> good progress. I was optimistically thinking of releasing a first version
>>>>>>>>> before the end of last year, but after I started working on my new job,
>>>>>>>>> progress on JsSqueak has slowed down significantly. I must confess that I
>>>>>>>>> (and especially my wife) hesitate in recreating that productive unemployed
>>>>>>>>> situation :)
>>>>>>>>>
>>>>>>>>> I started with Squeak 4.5 - I already had code transforming
>>>>>>>>> Smalltalk code to a form more suitable for translation - and I also started
>>>>>>>>> with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to
>>>>>>>>> heavily modify it, since I have to get rid of the stack usage for
>>>>>>>>> arguments/receiver and returns.
>>>>>>>>> Both of these big parts are working. I also implemented most
>>>>>>>>> numbered primitives by hand - they are inlined at generation time in the
>>>>>>>>> methods calling them.
>>>>>>>>> I am also taking advantage of the latest and greatest additions to
>>>>>>>>> JavaScript. I am, of course, using classes, and the parallel class-side
>>>>>>>>> hierarchy is implemented using statics. To implement green threads/process
>>>>>>>>> switching, all translated methods are implemented as generator functions,
>>>>>>>>> and all calls are through yield* expressions. The preemption/interrupt
>>>>>>>>> check points are inlined. With this, a process switch is achieved by simply
>>>>>>>>> yield-ing (in the process/semaphore primitives).
>>>>>>>>> With this, the Integer>>#benchFib method is translated (as a
>>>>>>>>> method in Number.prototype, there is one more, simpler, implementation in
>>>>>>>>> BigInt) as:
>>>>>>>>>
>>>>>>>>> *_benchFib() {
>>>>>>>>>    if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
>>>>>>>>>       /*Handy send-heavy benchmark*/
>>>>>>>>>    /*(result // seconds to run) = approx calls per second*/
>>>>>>>>>    /* | r t |
>>>>>>>>>      t := Time millisecondsToRun: [r := 26 benchFib].
>>>>>>>>>      (r // 1000) // t*/
>>>>>>>>>    /*138000 on a Mac 8100//100*/
>>>>>>>>>    if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();
>>>>>>>>>
>>>>>>>>>    return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
>>>>>>>>> } else // No implementation for #benchFib in Float hierarchy, trigger a DNU
>>>>>>>>>       return yield* super._benchFib()
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
>>>>>>>>>
>>>>>>>>> The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
>>>>>>>>>
>>>>>>>>> Of course, in Boolean, booleanValueOf is just an alias for valueOf
>>>>>>>>>
>>>>>>>>> As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized,
>>>>>>>>> although not completely, since their result can morph into LargeInteger (mapped to BigInt).
>>>>>>>>>
>>>>>>>>> As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
>>>>>>>>>
>>>>>>>>> BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef.
>>>>>>>>> I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
>>>>>>>>>
>>>>>>>>> The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
>>>>>>>>>
>>>>>>>>> Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
>>>>>>>>>
>>>>>>>>> The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.
>>>>>>>>>
>>>>>>>>> One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize
>>>>>>>>> and hook up the display.
>>>>>>>>>
>>>>>>>>> Florin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <
>>>>>>>>> vanessa at codefrau.net> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> ideas for a faster SqueakJS JIT have been swirling around in my
>>>>>>>>>> head for many years. This weekend I took the time to write some of them
>>>>>>>>>> down:
>>>>>>>>>>
>>>>>>>>>> https://squeak.js.org/docs/jit.md.html
>>>>>>>>>>
>>>>>>>>>> Feedback welcome!
>>>>>>>>>>
>>>>>>>>>> Vanessa
>>>>>>>>>>
>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210407/da14d8e0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 42124 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210407/da14d8e0/attachment-0001.png>


More information about the Vm-dev mailing list