[squeak-dev] The Inbox: Collections-cmm.874.mcz

David T. Lewis lewis at mail.msen.com
Thu Feb 6 01:47:23 UTC 2020


This conversation might well get a prize for lowest signal to noise ratio
in the entire history of the Internet.

It began with a proposed change that would have required toggling two bits
in the source code, which is close to a theoritical minumum for information
content:

  - Current code: '^ self basicNew initialize: 5'
  - Proposed change" '^ self basicNew initialize: 3'

This requires toggling two bits in a single ascii character:
    ($5 asciiValue bitXor: $3 asciiValue) bitCount ==> 2

Changing two bits in the source code of Squeak comes impressively close
to being the smallest thoeretically possible change, and these two
bit-twiddles now form the basis of a vast email thread that has evolved
to include an impressive mix of faulty assumptions, untested metrics,
passionate opinions, and invalid conclusions.

It might seem easy to suggest moving Collections-cmm.874 to the treated
inbox - after all, everyone who reviewed it gave it a solid thumbs
down - but that would spoil the fun. Let's keep this thing going as
long as we can, and in the end we can submit the thread for inclusion
in the Guinness book of world records.

;-)

Dave


On Wed, Feb 05, 2020 at 04:39:09PM -0600, Chris Muller wrote:
> >
> > try writing a manpage-quality comment for #new: explaining all the
> >> nuances...  when to use it, when not to, etc...  My hope that exercise
> >> would illuminate the issue I have.
> >>
> >
> > "Allocate a new instance of me with the given initial capacity. Use new:
> > instead of new if you want to avoid either unused capacity or repeated
> > growing (and thus, copying) of the collection's memory."
> >
> 
> The goal of using #new: is efficiency in general, space OR time (or,
> both).  So w.r.t. the case of optimizing for time, you forgot to mention
> the 40% performance hit they would take if the size happened to be 1 or 2...
> 
> 
> > Not quite man page length, but seems adequate to cover my expectations.
> >
> 
> Make a spreadsheet with the cartesian product of the following, one per
> row, and then manually fill out an extra column whether you'd write #new or
> #new:.
> 
>     (knows size | does't know size)
>     * (knows at runtime | knows at compile time)
>     * (will possibly grow | won't possibly grow)
>     * ( ... | ...)
> 
> When designing a class libary, please think in these general terms and
> consider *all cases*, instead of only your own expectations.
> 
> 
> > Also I don't expect that initial allocation with new: will always be
> >>> faster. I expect that I can save reallocations later on.
> >>>
> >>
> >> As I said before, not if you know your Dictionary will never grow.   This
> >> is not app-specific.
> >>
> >
> > If I use new: to optimize performance I am fairly certain that it won't
> > grow afterwards. My original statement still stands.
> >
> 
> You said "fairly" certain, now please consider the case where it's
> *certain* never
> to grow (perhaps row 187 on the spreadsheet, above).  I gave you a common,
> concrete example in an earlier post.  That's the case where your incomplete
> manpage comment (under Collections-ul.871) could easily trick the developer
> into hurting her apps performance when she thought she was helping it.  You
> argued (with flawed logic) that people would have to "think about the sizes
> all the time" but that's exactly what this would introduce, except in a
> much more insidious way!  Because it affects negatively depending on
> the *runtime
> parameter* value passed to #new:.
> 
> 
> > That is where the time is supposed to be saved. If new is faster than new:
> >>> 1, I am fine with that.
> >>>
> >>
> You say you're "fine" with that *certain* performance hit, but you're not
> fine with this uncertain one whose worst-case possibility was measured to
> be an equal performance cost.  *And*, you're even willing to trade away
> certainty of space-efficiency, too.
> 
> This thread is about reducing the defaultSize.  The other thread is about
> >> #new:1 needing to be same speed as #new.  They're two separate
> >> discussions.  An API design that forces developers to trade one
> >> optimization for another is bad design, plain and simple.
> >>
> >
> > Then the only sensible solution would be to deprecate and remove new for
> > preallocating collections altogether. Forcing everyone to always think
> > ahead of their collection sizes.
> >
> 
> Sigh.  With statements like the above, I wonder whether you're actually
> interested in finding a solution or simply arguing...  :(
> 
> 
> > For collections that are supposed to grow automatically. Slowing down
> > developent of even non-performance-critical parts. I don't want that either.
> >
> 
> This was your best argument, but it only takes about 10 seconds to
> understand its logical flaw -- if you're writing performance-critical code,
> you're going to write #new: anyway.  If you're not, an extra grow from
> writing #new won't affect performance.
> 
> Changing the current behavior can make it worse for existing applications
> > (again: we lack the numbers to be sure of the contrary).
> >
> 
> The worst-case scenario was benched and provided for you, and shown to be a
> similar effect as Collections-ul.871 on #new: 1.
> 
> 
> > Keeping the current behavior is bad with regards to your concerns.
> >
> 
> As I said multiple times, the "current behavior" is *fine* to my concerns,
> Jakob, it was Levente's proposal which introduced them.  This was just one
> of two different solutions proposed, but you handily rejected them both,
> even while offering zero alternatives or even acknowledgement that my
> concerns are valid.  I'm going to cut my losses from this discussion now.
> Your arguments seem like you either aren't understanding mine, or aren't
> willing to look for consensus.
> 
> Levente has committed Collections-ul.872 with only his other changes, e.g.,
> excluding the one which introduced my concern.  He's a gentleman.  It's sad
> that our inability to resolve this petty squabble kept back his other
> improvements, which would've included the ability to create minimally small
> Dictionary's with a minimum internal array size of 3 instead of 5.  :(
> 
>  - Chris

> 



More information about the Squeak-dev mailing list