[squeak-dev] The Inbox: Kernel-cmm.1198.mcz

Sun Nov 25 21:37:34 UTC 2018

Hi Levente,

Just a reminder, the original question I asked was:

> >>> Do you think the system would be noticably slower if all the sends to
> >>> #class became a message send?  ...

and your response:

> >> Yes, the bytecode is way quicker than the primitive or a primitive + a
> >> send which is exactly what you suggested.

So even though you answered a different question, I was still curious
by your claim, and remembered that you're one has liked to communicate
with benchmarks.  That's why I ran and presented them to you, but I'm
not sure if we're interpreting the results relative to my question or
some other question...

> > It saves one send.  One.  That's only infinitesimally quicker:
> > _________
> > { [1 xxxClass] bench.
> > [ 1 class ] bench.  }
> >
> >   ----> #('99,000,000 per second. 10.1 nanoseconds per run.'
> > '126,000,000 per second. 7.93 nanoseconds per run.')
> > ________
> >
> > 2 nanoseconds per send faster.  Inconsequential in any real-world
> > sense.  Furthermore, as soon as the message sent to the class does
> > *any work* whatsoever, that good-sounding 27% improvement is quickly
> > wiped out.  Look how much of the gain is lost doing as little as
> > creating one single Rectangle from another one:
> >
> > ___________
> > "Compare creating a single Rectangle with inlined #class vs. a
> > (proposed) message-send of #class."
> > | someRectangle |   someRectangle := 100 at 50 corner: 320 at 200.
> > {  [someRectangle xxxClass origin: someRectangle topLeft corner:
> > someRectangle bottomRight ] bench.
> >   [someRectangle class      origin: someRectangle topLeft corner:
> > someRectangle bottomRight ] bench.   }
> >
> >     --->  #('37,200,000 per second. 26.9 nanoseconds per run.'
> > '38,000,000 per second. 26.3 nanoseconds per run.')
> > ____________
> >
> > Real-world gain by the inlined send was reduced to...  whew!  I just
> > had to go learn about "Picosecond" because nanoseconds aren't even
> > small enough to measure the improvement.
> >
> > So, amplify.  Crank it up to 100K:
> > __________
> > "Compare creating a 100,000 Rectangles with inlined #class vs. a
> > message-send of #class."
> > | someRectangle |   someRectangle := 100 at 50 corner: 320 at 200.
> > {  [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle
> > topLeft corner: someRectangle bottomRight] ] bench.
> >   [ 100000 timesRepeat: [someRectangle class origin: someRectangle
> > topLeft corner: someRectangle bottomRight] ] bench.   }
> >
> >     ---> #('364 per second. 2.75 milliseconds per run.' '369 per
> > second. 2.71 milliseconds per run.')
> > _________
> >
> > Nothing times 100K is still nothing.
>
>
> That's not the right way to measure things that are so quick, because the
> overhead of block activation is comparable to the runtime of the code
> inside the block. Also, #timesRepeat: is not a good choice for
> measurements for the very same reason: block creation + lots of block
> activation.
> Also, the nearby bytecodes affect what the JIT does. When more things can
> be executed without performing a send, the overall performance gains
> will be higher.

There are three benchmarks, did you notice the first two?

    - The first one measures the single-unit cost of #xxxClass over
#class.  This captures your theoretical maximum benefit of 27%, which
is terrible, because it can't come close to that in real code.

    - The second demonstrates how 90% of that 27% benefit is wiped out
with no more than a single simple allocation -- what the vast majority
of class methods are responsible for.

    - The third one measures "real world impact", and shows that this
particular in-line doesn't help the system in any way that helps any
human anywhere.

> >> Also, removing the bytecode will make #class lose its atomicity. Any code
> >> that relies on that behavior will silently break.
> >
> > If THAT exists it needs a more intention-revealing selector than
> > #class that would let his peers know atomicity mattered there.
> > #basicClass is his friend.
>
> All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you
> think all of those need #basicXXX methods?

No just #class.  An identity-check should be an identity-check, even
against a Proxy.  And does that example help illustrate how using #==
when you DON'T need an identity-check is a breakage of encapsulation?
It makes false assumptions and enforces type-conformance in a system
that wants to be empowered by messaging.

> >>> ...  I am surprised to see we have so many senders of #class in
> >>> trunk, but I have a feeling most rarely ever called.
> >>
> >> I doubt that. People don't sprinkle #class sends for no reason, do they?
> >
> > Sorry, I should not have said "ever".  I was trying to say the system
> > probably spends most of its time sending to instance-side methods than
> > class-side methods.
>
> It's a common pattern to have instance-independent code on the class side.
> Quick access to that is always a good thing.

It's still quick!  Levente, I challenge you to back up your claim by
identifying any one single method in the image which reports even only
a meaningfully better *bench* performance (much less real-world) by
calling it via #class instead of #xxxClass.

Anything whose performance matters at a level of one send is going to
use #basicClass anyway, just like we may have a few that we send
#basicNew instead of #new to.

> >>> Not remove it, redirect it to #basicClass.
> >>
> >> Right, but while the bytecode is in effect, you just can't redirect
> >> it.
> >
> > I'm racking my brain trying to understand this -- sorry...   By
> > "redirect" I just meant change the Compiler to generate bytecode 199
> > for sends to #basicClass, and just the regular "send" bytecode for
> > sends to #class.  Then, recompile all methods.  Would that work?
>
> It might work, but you would need to identify and rewrite senders of
> #class which rely on the presence of the bytecode. In my image there are
> 2174 senders, which is simply too much review in my opinion.

I repeat my challenge above!

> I did some measurements and found that the JIT makes the numbered
> primitive almost as quick as the bytecode. The slowdown is only about 10%.
> Your suggestion, which is send + bytecode is about 85% slower and loses
> the atomicity of the message. So, you'd better leave the implementation of
> #class as it is right now, because that would be quicker and would
> preserve the atomicity as long as nothing overrides it.

Huh?  No, you're only 27% faster in the *benchmark*, but near zero in
anything real-world.

My challenge above, stands.  I would love to be wrong, so I could shed
my suspicion of whether this is about something else not mentioned...
 :(

> >>> This is a reasonable and familiar pattern, right?  It provides users
> >>> full control and WYSIWIG between source and bytecodes due to a crystal
> >>> clear selector name.  No magic.
> >
> > So, if
> >   performance is not really hurt, and
> >   we can keep sending #class if so insisted, and
> >   we still have #basicClass, just in case, together
> >   delineating an elegant seam between system-level vs. user-level access
> >   in a classic Smalltalky way that even *I* can understand and use,
> >   and give Squeak better Proxy support that helps Magma
> > then
> >   would you let me have this?
>
> As I wrote it a few emails earlier, I'd rather have a "switch" for this
> than forcing it on everyone who don't use proxies at all (I presume that's
> the current majority of Squeak users).

Whoa, hold on there.  You only ever made one argument -- "performance"
-- which was obliterated by the benchmarks.  Squeezing 27% more out of
a microbench of something called 0.0001% of the time results no
benefit to anyone anywhere.

I see MY position is the pro user position, and yours as the... pro
fastest-lab-result position, but hurts this Squeak user.  I'm sad that
that alone isn't enough to support this.    :(
_______
Do you remember when Behavior>>#new didn't always make a call to
#initialize?  But at a time when Squeak was 10X slower than it is now,
the people then had the wisdom to understand that the computer and
software exists to eventually serve _users_, and that spiting users to
save one single send, even when it was a much greater percentage of
impact back then, was still way worth it.