[squeak-dev] The Inbox: Kernel-cmm.1198.mcz
Levente Uzonyi
leves at caesar.elte.hu
Sun Nov 25 23:56:40 UTC 2018
Hi Chris,
This conversation is getting off the track, so let's take a step back and
try something different.
I had suggested you a solution: the "switch", but you never mentioned how
it worked for you. Perhaps my explanation wasn't clear.
Let me just give you a snippet which does exactly what I suggested.
Please try it in your image (one without Kernel-cmm.1198 loaded) and let
me know if it solved your problem or not:
(ParseNode classPool at: #StdSelectors) removeKey: #class.
Compiler recompileAll.
Levente
P.S.: Here's the benchmark I used to get my numbers:
runs := (1 to: 5) collect: [ :e |
{
[ 1 to: 50000000 do: [ :i | i class class class class class class
class class class class ] ] timeToRun.
[ 1 to: 50000000 do: [ :i | i classPrimitive classPrimitive
classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive
classPrimitive classPrimitive classPrimitive ] ] timeToRun.
[ 1 to: 50000000 do: [ :i | i classSend classSend classSend
classSend classSend classSend classSend classSend classSend classSend ] ]
timeToRun.
[ 1 to: 50000000 do: [ :i | i ] ] timeToRun } ].
cleanRuns := runs collect: [ :e | (e - e last) allButLast ].
primitiveVsByteCode := (cleanRuns collect: [ :e | e second / e first ])
average printShowingMaxDecimalPlaces: 2.
sendVsByteCode := (cleanRuns collect: [ :e | e third / e first ]) average
printShowingMaxDecimalPlaces: 2.
Where Object >> #classPrimitive is
classPrimitive
"Primitive. Answer the object which is the receiver's class.
Essential. See
Object documentation whatIsAPrimitive."
<primitive: 111>
self primitiveFailed
And Object >> #classSend is
classSend
"Primitive. Answer the object which is the receiver's class.
Essential. See
Object documentation whatIsAPrimitive."
^self class
On Sun, 25 Nov 2018, Chris Muller wrote:
> Hi Levente,
>
> Just a reminder, the original question I asked was:
>
>>>>> Do you think the system would be noticably slower if all the sends to
>>>>> #class became a message send? ...
>
> and your response:
>
>>>> Yes, the bytecode is way quicker than the primitive or a primitive + a
>>>> send which is exactly what you suggested.
>
> So even though you answered a different question, I was still curious
> by your claim, and remembered that you're one has liked to communicate
> with benchmarks. That's why I ran and presented them to you, but I'm
> not sure if we're interpreting the results relative to my question or
> some other question...
>
>>> It saves one send. One. That's only infinitesimally quicker:
>>> _________
>>> { [1 xxxClass] bench.
>>> [ 1 class ] bench. }
>>>
>>> ----> #('99,000,000 per second. 10.1 nanoseconds per run.'
>>> '126,000,000 per second. 7.93 nanoseconds per run.')
>>> ________
>>>
>>> 2 nanoseconds per send faster. Inconsequential in any real-world
>>> sense. Furthermore, as soon as the message sent to the class does
>>> *any work* whatsoever, that good-sounding 27% improvement is quickly
>>> wiped out. Look how much of the gain is lost doing as little as
>>> creating one single Rectangle from another one:
>>>
>>> ___________
>>> "Compare creating a single Rectangle with inlined #class vs. a
>>> (proposed) message-send of #class."
>>> | someRectangle | someRectangle := 100 at 50 corner: 320 at 200.
>>> { [someRectangle xxxClass origin: someRectangle topLeft corner:
>>> someRectangle bottomRight ] bench.
>>> [someRectangle class origin: someRectangle topLeft corner:
>>> someRectangle bottomRight ] bench. }
>>>
>>> ---> #('37,200,000 per second. 26.9 nanoseconds per run.'
>>> '38,000,000 per second. 26.3 nanoseconds per run.')
>>> ____________
>>>
>>> Real-world gain by the inlined send was reduced to... whew! I just
>>> had to go learn about "Picosecond" because nanoseconds aren't even
>>> small enough to measure the improvement.
>>>
>>> So, amplify. Crank it up to 100K:
>>> __________
>>> "Compare creating a 100,000 Rectangles with inlined #class vs. a
>>> message-send of #class."
>>> | someRectangle | someRectangle := 100 at 50 corner: 320 at 200.
>>> { [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle
>>> topLeft corner: someRectangle bottomRight] ] bench.
>>> [ 100000 timesRepeat: [someRectangle class origin: someRectangle
>>> topLeft corner: someRectangle bottomRight] ] bench. }
>>>
>>> ---> #('364 per second. 2.75 milliseconds per run.' '369 per
>>> second. 2.71 milliseconds per run.')
>>> _________
>>>
>>> Nothing times 100K is still nothing.
>>
>>
>> That's not the right way to measure things that are so quick, because the
>> overhead of block activation is comparable to the runtime of the code
>> inside the block. Also, #timesRepeat: is not a good choice for
>> measurements for the very same reason: block creation + lots of block
>> activation.
>> Also, the nearby bytecodes affect what the JIT does. When more things can
>> be executed without performing a send, the overall performance gains
>> will be higher.
>
> There are three benchmarks, did you notice the first two?
>
> - The first one measures the single-unit cost of #xxxClass over
> #class. This captures your theoretical maximum benefit of 27%, which
> is terrible, because it can't come close to that in real code.
>
> - The second demonstrates how 90% of that 27% benefit is wiped out
> with no more than a single simple allocation -- what the vast majority
> of class methods are responsible for.
>
> - The third one measures "real world impact", and shows that this
> particular in-line doesn't help the system in any way that helps any
> human anywhere.
>
>>>> Also, removing the bytecode will make #class lose its atomicity. Any code
>>>> that relies on that behavior will silently break.
>>>
>>> If THAT exists it needs a more intention-revealing selector than
>>> #class that would let his peers know atomicity mattered there.
>>> #basicClass is his friend.
>>
>> All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you
>> think all of those need #basicXXX methods?
>
> No just #class. An identity-check should be an identity-check, even
> against a Proxy. And does that example help illustrate how using #==
> when you DON'T need an identity-check is a breakage of encapsulation?
> It makes false assumptions and enforces type-conformance in a system
> that wants to be empowered by messaging.
>
>>>>> ... I am surprised to see we have so many senders of #class in
>>>>> trunk, but I have a feeling most rarely ever called.
>>>>
>>>> I doubt that. People don't sprinkle #class sends for no reason, do they?
>>>
>>> Sorry, I should not have said "ever". I was trying to say the system
>>> probably spends most of its time sending to instance-side methods than
>>> class-side methods.
>>
>> It's a common pattern to have instance-independent code on the class side.
>> Quick access to that is always a good thing.
>
> It's still quick! Levente, I challenge you to back up your claim by
> identifying any one single method in the image which reports even only
> a meaningfully better *bench* performance (much less real-world) by
> calling it via #class instead of #xxxClass.
>
> Anything whose performance matters at a level of one send is going to
> use #basicClass anyway, just like we may have a few that we send
> #basicNew instead of #new to.
>
>>>>> Not remove it, redirect it to #basicClass.
>>>>
>>>> Right, but while the bytecode is in effect, you just can't redirect
>>>> it.
>>>
>>> I'm racking my brain trying to understand this -- sorry... By
>>> "redirect" I just meant change the Compiler to generate bytecode 199
>>> for sends to #basicClass, and just the regular "send" bytecode for
>>> sends to #class. Then, recompile all methods. Would that work?
>>
>> It might work, but you would need to identify and rewrite senders of
>> #class which rely on the presence of the bytecode. In my image there are
>> 2174 senders, which is simply too much review in my opinion.
>
> I repeat my challenge above!
>
>> I did some measurements and found that the JIT makes the numbered
>> primitive almost as quick as the bytecode. The slowdown is only about 10%.
>> Your suggestion, which is send + bytecode is about 85% slower and loses
>> the atomicity of the message. So, you'd better leave the implementation of
>> #class as it is right now, because that would be quicker and would
>> preserve the atomicity as long as nothing overrides it.
>
> Huh? No, you're only 27% faster in the *benchmark*, but near zero in
> anything real-world.
>
> My challenge above, stands. I would love to be wrong, so I could shed
> my suspicion of whether this is about something else not mentioned...
> :(
>
>>>>> This is a reasonable and familiar pattern, right? It provides users
>>>>> full control and WYSIWIG between source and bytecodes due to a crystal
>>>>> clear selector name. No magic.
>>>
>>> So, if
>>> performance is not really hurt, and
>>> we can keep sending #class if so insisted, and
>>> we still have #basicClass, just in case, together
>>> delineating an elegant seam between system-level vs. user-level access
>>> in a classic Smalltalky way that even *I* can understand and use,
>>> and give Squeak better Proxy support that helps Magma
>>> then
>>> would you let me have this?
>>
>> As I wrote it a few emails earlier, I'd rather have a "switch" for this
>> than forcing it on everyone who don't use proxies at all (I presume that's
>> the current majority of Squeak users).
>
> Whoa, hold on there. You only ever made one argument -- "performance"
> -- which was obliterated by the benchmarks. Squeezing 27% more out of
> a microbench of something called 0.0001% of the time results no
> benefit to anyone anywhere.
>
> I see MY position is the pro user position, and yours as the... pro
> fastest-lab-result position, but hurts this Squeak user. I'm sad that
> that alone isn't enough to support this. :(
> _______
> Do you remember when Behavior>>#new didn't always make a call to
> #initialize? But at a time when Squeak was 10X slower than it is now,
> the people then had the wisdom to understand that the computer and
> software exists to eventually serve _users_, and that spiting users to
> save one single send, even when it was a much greater percentage of
> impact back then, was still way worth it.
>
More information about the Squeak-dev
mailing list
|