[squeak-dev] The Inbox: Kernel-cmm.1198.mcz

Sun Nov 25 18:09:05 UTC 2018

Hi Chris,

On Sun, 25 Nov 2018, Chris Muller wrote:

> Hi Levente,
>
>>> But what do you mean make all senders of #class use the primitive?
>>
>> Currently, when you compile a method containing a send of #class, the
>> compiler will generate a special bytecode for it (199).
>> When the interpreter/jit sees this bytecode, it will not perform a send
>> nor a primitive; it'll just look up the class of the receiver and place it
>> on top of the stack.
>
> Great!  Does that mean this can be accomplished solely in the image by
> making the compiler generate 199 when #basicClass is sent, and just
> the normal "send" bytecode for sends to #class?
>
>>> Do you think the system would be noticably slower if all the sends to
>>> #class became a message send?  I'm skeptical that it would, but I have
>>
>> Yes, the bytecode is way quicker than the primitive or a primitive + a
>> send which is exactly what you suggested.
>
> It saves one send.  One.  That's only infinitesimally quicker:
> _________
> { [1 xxxClass] bench.
> [ 1 class ] bench.  }
>
>   ----> #('99,000,000 per second. 10.1 nanoseconds per run.'
> '126,000,000 per second. 7.93 nanoseconds per run.')
> ________
>
> 2 nanoseconds per send faster.  Inconsequential in any real-world
> sense.  Furthermore, as soon as the message sent to the class does
> *any work* whatsoever, that good-sounding 27% improvement is quickly
> wiped out.  Look how much of the gain is lost doing as little as
> creating one single Rectangle from another one:
>
> ___________
> "Compare creating a single Rectangle with inlined #class vs. a
> (proposed) message-send of #class."
> | someRectangle |   someRectangle := 100 at 50 corner: 320 at 200.
> {  [someRectangle xxxClass origin: someRectangle topLeft corner:
> someRectangle bottomRight ] bench.
>   [someRectangle class      origin: someRectangle topLeft corner:
> someRectangle bottomRight ] bench.   }
>
>     --->  #('37,200,000 per second. 26.9 nanoseconds per run.'
> '38,000,000 per second. 26.3 nanoseconds per run.')
> ____________
>
> Real-world gain by the inlined send was reduced to...  whew!  I just
> had to go learn about "Picosecond" because nanoseconds aren't even
> small enough to measure the improvement.
>
> So, amplify.  Crank it up to 100K:
> __________
> "Compare creating a 100,000 Rectangles with inlined #class vs. a
> message-send of #class."
> | someRectangle |   someRectangle := 100 at 50 corner: 320 at 200.
> {  [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle
> topLeft corner: someRectangle bottomRight] ] bench.
>   [ 100000 timesRepeat: [someRectangle class origin: someRectangle
> topLeft corner: someRectangle bottomRight] ] bench.   }
>
>     ---> #('364 per second. 2.75 milliseconds per run.' '369 per
> second. 2.71 milliseconds per run.')
> _________
>
> Nothing times 100K is still nothing.

That's not the right way to measure things that are so quick, because the 
overhead of block activation is comparable to the runtime of the code 
inside the block. Also, #timesRepeat: is not a good choice for 
measurements for the very same reason: block creation + lots of block 
activation.
Also, the nearby bytecodes affect what the JIT does. When more things can 
be executed without performing a send, the overall performance gains 
will be higher.

>
>> Also, removing the bytecode will make #class lose its atomicity. Any code
>> that relies on that behavior will silently break.
>
> If THAT exists it needs a more intention-revealing selector than
> #class that would let his peers know atomicity mattered there.
> #basicClass is his friend.

All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you 
think all of those need #basicXXX methods?

>
>>> ...  I am surprised to see we have so many senders of #class in
>>> trunk, but I have a feeling most rarely ever called.
>>
>> I doubt that. People don't sprinkle #class sends for no reason, do they?
>
> Sorry, I should not have said "ever".  I was trying to say the system
> probably spends most of its time sending to instance-side methods than
> class-side methods.

It's a common pattern to have instance-independent code on the class side. 
Quick access to that is always a good thing.

>
>>> Not remove it, redirect it to #basicClass.
>>
>> Right, but while the bytecode is in effect, you just can't redirect
>> it.
>
> I'm racking my brain trying to understand this -- sorry...   By
> "redirect" I just meant change the Compiler to generate bytecode 199
> for sends to #basicClass, and just the regular "send" bytecode for
> sends to #class.  Then, recompile all methods.  Would that work?

It might work, but you would need to identify and rewrite senders of 
#class which rely on the presence of the bytecode. In my image there are 
2174 senders, which is simply too much review in my opinion.

I did some measurements and found that the JIT makes the numbered 
primitive almost as quick as the bytecode. The slowdown is only about 10%. 
Your suggestion, which is send + bytecode is about 85% slower and loses 
the atomicity of the message. So, you'd better leave the implementation of 
#class as it is right now, because that would be quicker and would 
preserve the atomicity as long as nothing overrides it.

>
>>> This is a reasonable and familiar pattern, right?  It provides users
>>> full control and WYSIWIG between source and bytecodes due to a crystal
>>> clear selector name.  No magic.
>
> So, if
>   performance is not really hurt, and
>   we can keep sending #class if so insisted, and
>   we still have #basicClass, just in case, together
>   delineating an elegant seam between system-level vs. user-level access
>   in a classic Smalltalky way that even *I* can understand and use,
>   and give Squeak better Proxy support that helps Magma
> then
>   would you let me have this?

As I wrote it a few emails earlier, I'd rather have a "switch" for this 
than forcing it on everyone who don't use proxies at all (I presume that's 
the current majority of Squeak users).

Levente

>
> You have a skill of making performance-considerations to such degrees
> that I never even would have fathomed, and this has resulted in
> immense performance benefits for Squeak.  I do wish you liked Magma,
> because I'm sure you could _obliterate_ many inefficiencies in the
> code and design.  But if not, I hope you can at least appreciate the
> value proposition of this proposal is worth it.
>
> - Chris
>