[squeak-dev] The Inbox: Collections-cmm.874.mcz

Chris Muller ma.chris.m at gmail.com
Sat Jan 25 05:54:12 UTC 2020


>
> >
> >       I don't think we would want to do that without measuring the
> effects of
> >       the change somehow.
> >
> >
> > Performance measuring is already part of every system that cares,
> right?   :)
>
> Of course not. That's why "somehow" is there. I don't know how it could be
> measured, but I think that without measurements, it's risky to change
> something like that.
>

You should be careful, Levente, Collections-ul.871 does it, too.
Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array
size from 5 to 3.  And, yes, it did indeed introduce this unexpected
anomaly!


>
> > None of those systems are depending on any particular default initial
> size for performance.  If they are, they should be fixed.
>
> Here is an example scenario:
> I write my code. I use [OrderedCollection new]. I see that my code is fast
> enough.
> You change the default capacity from 10 to 3. My code is now too slow. I
> have to profile it to see why. It turns out that I store 7-9 elements most
> of the time, and the capacity of 10 was a good fit, but 3 is not, because
> it means growing twice (first to 6, then to 12), and my code ends up
> being slower and using more memory than before.
>

This example makes the case for this proposal, by showing that it was
*depending on knowing the private, internal initial size*, for its
performance.  By having written #new instead of #new: in
performance-critical code, it was and still is less efficient than it could
be.  No amount of "guessing" of an initial size will help execution
performance, but could at least guarantee space efficiency.

More importantly, to me, it brings clear, definitive responsibility to
#new: over #new.  *Efficiency*.



>
> >
> >
> >       Same applies to OrderedCollections.
> >
> >       Following your reasoning, the optimal initial capacity for any
> dynamic
> >       collection created with #new should be 0, but that doesn't feel
> right...
> >
> >
> > Not 0, 3.  It's "reasoning", not extremism.  :)
> >
> > 3 would be a great default for OrderedCollection.  Ultra-minimal, but
> with a purpose.   I mean, really, with 10, we have 92% of all instances
> wasting space.
>
> "Wasting space" is a must when you implement dynamic arrays and
> hash tables. It's a space-time tradeoff[1].
>

OrderedCollection isn't hashed.

If we want have a sliding-scale trade-off between space and performance, so
that "Dictionary new" everywhere in the code could be treated to run time
or space-efficiently, that'd be something I could respect.  We'd need to
introduce some kind a variable or PoolDictionary constant to allow a
developer to customize, for the entire image, all cases of "Dictionary new"
to run space-efficiently (e.g., on a 1GB RPi) or time-efficiently.

But we should not break #new: 1 and #new: 2 for HashedCollections, please.

 - Chris


>
>
> Levente
>
> [1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200124/a57fa5e2/attachment.html>


More information about the Squeak-dev mailing list