[squeak-dev] The Inbox: Kernel-cmm.1198.mcz
Chris Muller
asqueaker at gmail.com
Sun Nov 25 07:13:20 UTC 2018
Hi Levente,
> > But what do you mean make all senders of #class use the primitive?
>
> Currently, when you compile a method containing a send of #class, the
> compiler will generate a special bytecode for it (199).
> When the interpreter/jit sees this bytecode, it will not perform a send
> nor a primitive; it'll just look up the class of the receiver and place it
> on top of the stack.
Great! Does that mean this can be accomplished solely in the image by
making the compiler generate 199 when #basicClass is sent, and just
the normal "send" bytecode for sends to #class?
> > Do you think the system would be noticably slower if all the sends to
> > #class became a message send? I'm skeptical that it would, but I have
>
> Yes, the bytecode is way quicker than the primitive or a primitive + a
> send which is exactly what you suggested.
It saves one send. One. That's only infinitesimally quicker:
_________
{ [1 xxxClass] bench.
[ 1 class ] bench. }
----> #('99,000,000 per second. 10.1 nanoseconds per run.'
'126,000,000 per second. 7.93 nanoseconds per run.')
________
2 nanoseconds per send faster. Inconsequential in any real-world
sense. Furthermore, as soon as the message sent to the class does
*any work* whatsoever, that good-sounding 27% improvement is quickly
wiped out. Look how much of the gain is lost doing as little as
creating one single Rectangle from another one:
___________
"Compare creating a single Rectangle with inlined #class vs. a
(proposed) message-send of #class."
| someRectangle | someRectangle := 100 at 50 corner: 320 at 200.
{ [someRectangle xxxClass origin: someRectangle topLeft corner:
someRectangle bottomRight ] bench.
[someRectangle class origin: someRectangle topLeft corner:
someRectangle bottomRight ] bench. }
---> #('37,200,000 per second. 26.9 nanoseconds per run.'
'38,000,000 per second. 26.3 nanoseconds per run.')
____________
Real-world gain by the inlined send was reduced to... whew! I just
had to go learn about "Picosecond" because nanoseconds aren't even
small enough to measure the improvement.
So, amplify. Crank it up to 100K:
__________
"Compare creating a 100,000 Rectangles with inlined #class vs. a
message-send of #class."
| someRectangle | someRectangle := 100 at 50 corner: 320 at 200.
{ [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle
topLeft corner: someRectangle bottomRight] ] bench.
[ 100000 timesRepeat: [someRectangle class origin: someRectangle
topLeft corner: someRectangle bottomRight] ] bench. }
---> #('364 per second. 2.75 milliseconds per run.' '369 per
second. 2.71 milliseconds per run.')
_________
Nothing times 100K is still nothing.
> Also, removing the bytecode will make #class lose its atomicity. Any code
> that relies on that behavior will silently break.
If THAT exists it needs a more intention-revealing selector than
#class that would let his peers know atomicity mattered there.
#basicClass is his friend.
> > ... I am surprised to see we have so many senders of #class in
> > trunk, but I have a feeling most rarely ever called.
>
> I doubt that. People don't sprinkle #class sends for no reason, do they?
Sorry, I should not have said "ever". I was trying to say the system
probably spends most of its time sending to instance-side methods than
class-side methods.
> > Not remove it, redirect it to #basicClass.
>
> Right, but while the bytecode is in effect, you just can't redirect
> it.
I'm racking my brain trying to understand this -- sorry... By
"redirect" I just meant change the Compiler to generate bytecode 199
for sends to #basicClass, and just the regular "send" bytecode for
sends to #class. Then, recompile all methods. Would that work?
> > This is a reasonable and familiar pattern, right? It provides users
> > full control and WYSIWIG between source and bytecodes due to a crystal
> > clear selector name. No magic.
So, if
performance is not really hurt, and
we can keep sending #class if so insisted, and
we still have #basicClass, just in case, together
delineating an elegant seam between system-level vs. user-level access
in a classic Smalltalky way that even *I* can understand and use,
and give Squeak better Proxy support that helps Magma
then
would you let me have this?
You have a skill of making performance-considerations to such degrees
that I never even would have fathomed, and this has resulted in
immense performance benefits for Squeak. I do wish you liked Magma,
because I'm sure you could _obliterate_ many inefficiencies in the
code and design. But if not, I hope you can at least appreciate the
value proposition of this proposal is worth it.
- Chris
More information about the Squeak-dev
mailing list
|