Hi Levente,
Just a reminder, the original question I asked was:
Do you think the system would be noticably slower if all the sends to #class became a message send? ...
and your response:
Yes, the bytecode is way quicker than the primitive or a primitive + a send which is exactly what you suggested.
So even though you answered a different question, I was still curious by your claim, and remembered that you're one has liked to communicate with benchmarks. That's why I ran and presented them to you, but I'm not sure if we're interpreting the results relative to my question or some other question...
It saves one send. One. That's only infinitesimally quicker: _________ { [1 xxxClass] bench. [ 1 class ] bench. }
----> #('99,000,000 per second. 10.1 nanoseconds per run.' '126,000,000 per second. 7.93 nanoseconds per run.') ________
2 nanoseconds per send faster. Inconsequential in any real-world sense. Furthermore, as soon as the message sent to the class does *any work* whatsoever, that good-sounding 27% improvement is quickly wiped out. Look how much of the gain is lost doing as little as creating one single Rectangle from another one:
"Compare creating a single Rectangle with inlined #class vs. a (proposed) message-send of #class." | someRectangle | someRectangle := 100@50 corner: 320@200. { [someRectangle xxxClass origin: someRectangle topLeft corner: someRectangle bottomRight ] bench. [someRectangle class origin: someRectangle topLeft corner: someRectangle bottomRight ] bench. }
---> #('37,200,000 per second. 26.9 nanoseconds per run.'
'38,000,000 per second. 26.3 nanoseconds per run.') ____________
Real-world gain by the inlined send was reduced to... whew! I just had to go learn about "Picosecond" because nanoseconds aren't even small enough to measure the improvement.
So, amplify. Crank it up to 100K: __________ "Compare creating a 100,000 Rectangles with inlined #class vs. a message-send of #class." | someRectangle | someRectangle := 100@50 corner: 320@200. { [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle topLeft corner: someRectangle bottomRight] ] bench. [ 100000 timesRepeat: [someRectangle class origin: someRectangle topLeft corner: someRectangle bottomRight] ] bench. }
---> #('364 per second. 2.75 milliseconds per run.' '369 per
second. 2.71 milliseconds per run.') _________
Nothing times 100K is still nothing.
That's not the right way to measure things that are so quick, because the overhead of block activation is comparable to the runtime of the code inside the block. Also, #timesRepeat: is not a good choice for measurements for the very same reason: block creation + lots of block activation. Also, the nearby bytecodes affect what the JIT does. When more things can be executed without performing a send, the overall performance gains will be higher.
There are three benchmarks, did you notice the first two?
- The first one measures the single-unit cost of #xxxClass over #class. This captures your theoretical maximum benefit of 27%, which is terrible, because it can't come close to that in real code.
- The second demonstrates how 90% of that 27% benefit is wiped out with no more than a single simple allocation -- what the vast majority of class methods are responsible for.
- The third one measures "real world impact", and shows that this particular in-line doesn't help the system in any way that helps any human anywhere.
Also, removing the bytecode will make #class lose its atomicity. Any code that relies on that behavior will silently break.
If THAT exists it needs a more intention-revealing selector than #class that would let his peers know atomicity mattered there. #basicClass is his friend.
All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you think all of those need #basicXXX methods?
No just #class. An identity-check should be an identity-check, even against a Proxy. And does that example help illustrate how using #== when you DON'T need an identity-check is a breakage of encapsulation? It makes false assumptions and enforces type-conformance in a system that wants to be empowered by messaging.
... I am surprised to see we have so many senders of #class in trunk, but I have a feeling most rarely ever called.
I doubt that. People don't sprinkle #class sends for no reason, do they?
Sorry, I should not have said "ever". I was trying to say the system probably spends most of its time sending to instance-side methods than class-side methods.
It's a common pattern to have instance-independent code on the class side. Quick access to that is always a good thing.
It's still quick! Levente, I challenge you to back up your claim by identifying any one single method in the image which reports even only a meaningfully better *bench* performance (much less real-world) by calling it via #class instead of #xxxClass.
Anything whose performance matters at a level of one send is going to use #basicClass anyway, just like we may have a few that we send #basicNew instead of #new to.
Not remove it, redirect it to #basicClass.
Right, but while the bytecode is in effect, you just can't redirect it.
I'm racking my brain trying to understand this -- sorry... By "redirect" I just meant change the Compiler to generate bytecode 199 for sends to #basicClass, and just the regular "send" bytecode for sends to #class. Then, recompile all methods. Would that work?
It might work, but you would need to identify and rewrite senders of #class which rely on the presence of the bytecode. In my image there are 2174 senders, which is simply too much review in my opinion.
I repeat my challenge above!
I did some measurements and found that the JIT makes the numbered primitive almost as quick as the bytecode. The slowdown is only about 10%. Your suggestion, which is send + bytecode is about 85% slower and loses the atomicity of the message. So, you'd better leave the implementation of #class as it is right now, because that would be quicker and would preserve the atomicity as long as nothing overrides it.
Huh? No, you're only 27% faster in the *benchmark*, but near zero in anything real-world.
My challenge above, stands. I would love to be wrong, so I could shed my suspicion of whether this is about something else not mentioned... :(
This is a reasonable and familiar pattern, right? It provides users full control and WYSIWIG between source and bytecodes due to a crystal clear selector name. No magic.
So, if performance is not really hurt, and we can keep sending #class if so insisted, and we still have #basicClass, just in case, together delineating an elegant seam between system-level vs. user-level access in a classic Smalltalky way that even *I* can understand and use, and give Squeak better Proxy support that helps Magma then would you let me have this?
As I wrote it a few emails earlier, I'd rather have a "switch" for this than forcing it on everyone who don't use proxies at all (I presume that's the current majority of Squeak users).
Whoa, hold on there. You only ever made one argument -- "performance" -- which was obliterated by the benchmarks. Squeezing 27% more out of a microbench of something called 0.0001% of the time results no benefit to anyone anywhere.
I see MY position is the pro user position, and yours as the... pro fastest-lab-result position, but hurts this Squeak user. I'm sad that that alone isn't enough to support this. :( _______ Do you remember when Behavior>>#new didn't always make a call to #initialize? But at a time when Squeak was 10X slower than it is now, the people then had the wisdom to understand that the computer and software exists to eventually serve _users_, and that spiting users to save one single send, even when it was a much greater percentage of impact back then, was still way worth it.