[squeak-dev] The Inbox: Kernel-cmm.1198.mcz

Sun Nov 25 23:56:40 UTC 2018

Hi Chris,

This conversation is getting off the track, so let's take a step back and 
try something different.
I had suggested you a solution: the "switch", but you never mentioned how 
it worked for you. Perhaps my explanation wasn't clear.
Let me just give you a snippet which does exactly what I suggested.
Please try it in your image (one without Kernel-cmm.1198 loaded) and let 
me know if it solved your problem or not:

 	(ParseNode classPool at: #StdSelectors) removeKey: #class.
 	Compiler recompileAll.

Levente

P.S.: Here's the benchmark I used to get my numbers:

runs := (1 to: 5) collect: [ :e |
{
 	[ 1 to: 50000000 do: [ :i | i class class class class class class 
class class class class ] ] timeToRun.
 	[ 1 to: 50000000 do: [ :i | i classPrimitive classPrimitive 
classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive 
classPrimitive classPrimitive classPrimitive ] ] timeToRun.
 	[ 1 to: 50000000 do: [ :i | i classSend classSend classSend 
classSend classSend classSend classSend classSend classSend classSend ] ] 
timeToRun.
 	[ 1 to: 50000000 do: [ :i | i ] ] timeToRun } ].
cleanRuns := runs collect: [ :e | (e - e last) allButLast ].
primitiveVsByteCode := (cleanRuns collect: [ :e | e second / e first ]) 
average printShowingMaxDecimalPlaces: 2.
sendVsByteCode := (cleanRuns collect: [ :e | e third / e first ]) average 
printShowingMaxDecimalPlaces: 2.

Where Object >> #classPrimitive is

classPrimitive
 	"Primitive. Answer the object which is the receiver's class. 
Essential. See
 	Object documentation whatIsAPrimitive."

 	<primitive: 111>
 	self primitiveFailed

And Object >> #classSend is

classSend
 	"Primitive. Answer the object which is the receiver's class. 
Essential. See
 	Object documentation whatIsAPrimitive."

 	^self class

On Sun, 25 Nov 2018, Chris Muller wrote:

> Hi Levente,
>
> Just a reminder, the original question I asked was:
>
>>>>> Do you think the system would be noticably slower if all the sends to
>>>>> #class became a message send?  ...
>
> and your response:
>
>>>> Yes, the bytecode is way quicker than the primitive or a primitive + a
>>>> send which is exactly what you suggested.
>
> So even though you answered a different question, I was still curious
> by your claim, and remembered that you're one has liked to communicate
> with benchmarks.  That's why I ran and presented them to you, but I'm
> not sure if we're interpreting the results relative to my question or
> some other question...
>
>>> It saves one send.  One.  That's only infinitesimally quicker:
>>> _________
>>> { [1 xxxClass] bench.
>>> [ 1 class ] bench.  }
>>>
>>>   ----> #('99,000,000 per second. 10.1 nanoseconds per run.'
>>> '126,000,000 per second. 7.93 nanoseconds per run.')
>>> ________
>>>
>>> 2 nanoseconds per send faster.  Inconsequential in any real-world
>>> sense.  Furthermore, as soon as the message sent to the class does
>>> *any work* whatsoever, that good-sounding 27% improvement is quickly
>>> wiped out.  Look how much of the gain is lost doing as little as
>>> creating one single Rectangle from another one:
>>>
>>> ___________
>>> "Compare creating a single Rectangle with inlined #class vs. a
>>> (proposed) message-send of #class."
>>> | someRectangle |   someRectangle := 100 at 50 corner: 320 at 200.
>>> {  [someRectangle xxxClass origin: someRectangle topLeft corner:
>>> someRectangle bottomRight ] bench.
>>>   [someRectangle class      origin: someRectangle topLeft corner:
>>> someRectangle bottomRight ] bench.   }
>>>
>>>     --->  #('37,200,000 per second. 26.9 nanoseconds per run.'
>>> '38,000,000 per second. 26.3 nanoseconds per run.')
>>> ____________
>>>
>>> Real-world gain by the inlined send was reduced to...  whew!  I just
>>> had to go learn about "Picosecond" because nanoseconds aren't even
>>> small enough to measure the improvement.
>>>
>>> So, amplify.  Crank it up to 100K:
>>> __________
>>> "Compare creating a 100,000 Rectangles with inlined #class vs. a
>>> message-send of #class."
>>> | someRectangle |   someRectangle := 100 at 50 corner: 320 at 200.
>>> {  [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle
>>> topLeft corner: someRectangle bottomRight] ] bench.
>>>   [ 100000 timesRepeat: [someRectangle class origin: someRectangle
>>> topLeft corner: someRectangle bottomRight] ] bench.   }
>>>
>>>     ---> #('364 per second. 2.75 milliseconds per run.' '369 per
>>> second. 2.71 milliseconds per run.')
>>> _________
>>>
>>> Nothing times 100K is still nothing.
>>
>>
>> That's not the right way to measure things that are so quick, because the
>> overhead of block activation is comparable to the runtime of the code
>> inside the block. Also, #timesRepeat: is not a good choice for
>> measurements for the very same reason: block creation + lots of block
>> activation.
>> Also, the nearby bytecodes affect what the JIT does. When more things can
>> be executed without performing a send, the overall performance gains
>> will be higher.
>
> There are three benchmarks, did you notice the first two?
>
>    - The first one measures the single-unit cost of #xxxClass over
> #class.  This captures your theoretical maximum benefit of 27%, which
> is terrible, because it can't come close to that in real code.
>
>    - The second demonstrates how 90% of that 27% benefit is wiped out
> with no more than a single simple allocation -- what the vast majority
> of class methods are responsible for.
>
>    - The third one measures "real world impact", and shows that this
> particular in-line doesn't help the system in any way that helps any
> human anywhere.
>
>>>> Also, removing the bytecode will make #class lose its atomicity. Any code
>>>> that relies on that behavior will silently break.
>>>
>>> If THAT exists it needs a more intention-revealing selector than
>>> #class that would let his peers know atomicity mattered there.
>>> #basicClass is his friend.
>>
>> All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you
>> think all of those need #basicXXX methods?
>
> No just #class.  An identity-check should be an identity-check, even
> against a Proxy.  And does that example help illustrate how using #==
> when you DON'T need an identity-check is a breakage of encapsulation?
> It makes false assumptions and enforces type-conformance in a system
> that wants to be empowered by messaging.
>
>>>>> ...  I am surprised to see we have so many senders of #class in
>>>>> trunk, but I have a feeling most rarely ever called.
>>>>
>>>> I doubt that. People don't sprinkle #class sends for no reason, do they?
>>>
>>> Sorry, I should not have said "ever".  I was trying to say the system
>>> probably spends most of its time sending to instance-side methods than
>>> class-side methods.
>>
>> It's a common pattern to have instance-independent code on the class side.
>> Quick access to that is always a good thing.
>
> It's still quick!  Levente, I challenge you to back up your claim by
> identifying any one single method in the image which reports even only
> a meaningfully better *bench* performance (much less real-world) by
> calling it via #class instead of #xxxClass.
>
> Anything whose performance matters at a level of one send is going to
> use #basicClass anyway, just like we may have a few that we send
> #basicNew instead of #new to.
>
>>>>> Not remove it, redirect it to #basicClass.
>>>>
>>>> Right, but while the bytecode is in effect, you just can't redirect
>>>> it.
>>>
>>> I'm racking my brain trying to understand this -- sorry...   By
>>> "redirect" I just meant change the Compiler to generate bytecode 199
>>> for sends to #basicClass, and just the regular "send" bytecode for
>>> sends to #class.  Then, recompile all methods.  Would that work?
>>
>> It might work, but you would need to identify and rewrite senders of
>> #class which rely on the presence of the bytecode. In my image there are
>> 2174 senders, which is simply too much review in my opinion.
>
> I repeat my challenge above!
>
>> I did some measurements and found that the JIT makes the numbered
>> primitive almost as quick as the bytecode. The slowdown is only about 10%.
>> Your suggestion, which is send + bytecode is about 85% slower and loses
>> the atomicity of the message. So, you'd better leave the implementation of
>> #class as it is right now, because that would be quicker and would
>> preserve the atomicity as long as nothing overrides it.
>
> Huh?  No, you're only 27% faster in the *benchmark*, but near zero in
> anything real-world.
>
> My challenge above, stands.  I would love to be wrong, so I could shed
> my suspicion of whether this is about something else not mentioned...
> :(
>
>>>>> This is a reasonable and familiar pattern, right?  It provides users
>>>>> full control and WYSIWIG between source and bytecodes due to a crystal
>>>>> clear selector name.  No magic.
>>>
>>> So, if
>>>   performance is not really hurt, and
>>>   we can keep sending #class if so insisted, and
>>>   we still have #basicClass, just in case, together
>>>   delineating an elegant seam between system-level vs. user-level access
>>>   in a classic Smalltalky way that even *I* can understand and use,
>>>   and give Squeak better Proxy support that helps Magma
>>> then
>>>   would you let me have this?
>>
>> As I wrote it a few emails earlier, I'd rather have a "switch" for this
>> than forcing it on everyone who don't use proxies at all (I presume that's
>> the current majority of Squeak users).
>
> Whoa, hold on there.  You only ever made one argument -- "performance"
> -- which was obliterated by the benchmarks.  Squeezing 27% more out of
> a microbench of something called 0.0001% of the time results no
> benefit to anyone anywhere.
>
> I see MY position is the pro user position, and yours as the... pro
> fastest-lab-result position, but hurts this Squeak user.  I'm sad that
> that alone isn't enough to support this.    :(
> _______
> Do you remember when Behavior>>#new didn't always make a call to
> #initialize?  But at a time when Squeak was 10X slower than it is now,
> the people then had the wisdom to understand that the computer and
> software exists to eventually serve _users_, and that spiting users to
> save one single send, even when it was a much greater percentage of
> impact back then, was still way worth it.
>