[squeak-dev] The Inbox: Collections-cmm.874.mcz

Levente Uzonyi leves at caesar.elte.hu
Mon Jan 27 03:34:24 UTC 2020


Hi Chris,

On Fri, 24 Jan 2020, Chris Muller wrote:

>       >  
>       >       I don't think we would want to do that without measuring the effects of
>       >       the change somehow.
>       >
>       >
>       > Performance measuring is already part of every system that cares, right?   :)
>
>       Of course not. That's why "somehow" is there. I don't know how it could be
>       measured, but I think that without measurements, it's risky to change
>       something like that.
> 
> 
> You should be careful, Levente, Collections-ul.871 does it, too.  Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3.  And, yes, it did indeed introduce this unexpected anomaly!

When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to 
get a dictionary that can hold one or two elements, respectively.
Collections-ul.871, just like the former version, creates dictionaries 
matching those expectations, but Dictionaries returned by the new version 
use less memory.
So, what's the problem?

>  
>
>       > None of those systems are depending on any particular default initial size for performance.  If they are, they should be fixed.
>
>       Here is an example scenario:
>       I write my code. I use [OrderedCollection new]. I see that my code is fast
>       enough.
>       You change the default capacity from 10 to 3. My code is now too slow. I
>       have to profile it to see why. It turns out that I store 7-9 elements most
>       of the time, and the capacity of 10 was a good fit, but 3 is not, because
>       it means growing twice (first to 6, then to 12), and my code ends up
>       being slower and using more memory than before.
> 
> 
> This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance.  By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be.  No amount of "guessing" of an initial
> size will help execution performance, but could at least guarantee space efficiency.

No, the example assumes that the user knows nothing about the 
internals of Dictionary. The performance of the example depends on the 
inital capacity.
How often do you investigate the internals of classes your code uses to 
avoid relying on the default values coded into them when there are no 
performance problems with your code? I guess never...

> 
> More importantly, to me, it brings clear, definitive responsibility to #new: over #new.  Efficiency.

As I wrote it many times in this thread, #new: already gives you 
efficiency over #new, because it lets you avoid growing.
The optimization in HashedCollection class >> #new is what makes you think 
otherwise.
The problem stems from two things:
- you doing microbenchmarks on #new: and #new, ignoring the costs of 
filling up the collections
- the simple optimization in #new

> 
>  
>
>       >  
>       >
>       >       Same applies to OrderedCollections.
>       >
>       >       Following your reasoning, the optimal initial capacity for any dynamic
>       >       collection created with #new should be 0, but that doesn't feel right...
>       >
>       >
>       > Not 0, 3.  It's "reasoning", not extremism.  :)
>       >
>       > 3 would be a great default for OrderedCollection.  Ultra-minimal, but with a purpose.   I mean, really, with 10, we have 92% of all instances wasting space.
>
>       "Wasting space" is a must when you implement dynamic arrays and
>       hash tables. It's a space-time tradeoff[1].
> 
> 
> OrderedCollection isn't hashed.

OrderedCollection is a dynamic array[1].

> 
> If we want have a sliding-scale trade-off between space and performance, so that "Dictionary new" everywhere in the code could be treated to run time or space-efficiently, that'd be something I could respect.  We'd need to introduce some kind a variable or PoolDictionary constant to allow a developer to

I don't think we want anything like that.

> customize, for the entire image, all cases of "Dictionary new" to run space-efficiently (e.g., on a 1GB RPi) or time-efficiently.
> 
> But we should not break #new: 1 and #new: 2 for HashedCollections, please.

The performance difference between #new, and #new: X where X is <= 3, 
didn't bother you (or anyone else) until recently.
I proposed all kinds of solutions to minimize the difference between #new 
and #new:, and we seemed to agree on one of those.


Levente

[1] https://en.wikipedia.org/wiki/Dynamic_array

> 
>  - Chris
>  
> 
>
>       Levente
>
>       [1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff
> 
> 
> 
>


More information about the Squeak-dev mailing list