[squeak-dev] The Inbox: Collections-cmm.874.mcz

Chris Muller asqueaker at gmail.com
Wed Jan 29 09:00:56 UTC 2020


Hi Jakob,

My main reply is in the other one to you and Levente, but some quick
responses here for minor embellishment.  :)

On Sat, Jan 25, 2020 at 3:21 AM Jakob Reschke <forums.jakob at resfarm.de>
wrote:

> Am Sa., 25. Jan. 2020 um 06:55 Uhr schrieb Chris Muller <
> ma.chris.m at gmail.com>:
>
>> Here is an example scenario:
>>>
>> I write my code. I use [OrderedCollection new]. I see that my code is
>>> fast
>>> enough.
>>> You change the default capacity from 10 to 3. My code is now too slow. I
>>> have to profile it to see why. It turns out that I store 7-9 elements
>>> most
>>> of the time, and the capacity of 10 was a good fit, but 3 is not,
>>> because
>>> it means growing twice (first to 6, then to 12), and my code ends up
>>> being slower and using more memory than before.
>>>
>>
>> This example makes the case for this proposal, by showing that it was
>> *depending on knowing the private, internal initial size*, for its
>> performance.  By having written #new instead of #new: in
>> performance-critical code, it was and still is less efficient than it could
>> be.  No amount of "guessing" of an initial size will help execution
>> performance, but could at least guarantee space efficiency.
>>
>
> If you optimize the default for space instead of sticking with a
> reasonable tradeoff,
>

"reasonable tradeoff" is what I'm trying to convince you is completely
baseless.


> you might force people to use new: and think about the very implementation
> details of those collections to get back to reasonable results.
>

Its no different than we have now.  Thinking about the size wherever you
can is a good thing.

Your fear of changing #new is because of the fuzziness of its definition.
What you're calling "reasonable" is actually just "random".  If it were
definitive (e.g., space-efficient), the impact of changing it would be, too.

You might turn a piece of code into a bottleneck even though it was not
> considered performance-critical before.
>

Or it might rescue a suffering application because it's no longer paging
RAM out to disk...  :)

On the other hand, who else was bothered by too sparse hashed or ordered
> collections until now?
>

It's about designing the most-efficient system and the best API, not who
has been bothered yet.


> Is it a problem that bothers many, in comparison to the group which the
> change could bother?
>

What happens to that group when they move their code to another Smalltalk
which uses a different default?


> I suppose this is premature optimization. If people have identified
> compactness as a requirement,
>

When all else is equal, more compact is *always* better than less.


> they shall use #new: with (domain specific) expected numbers or patch #new
> for their application. But don't force it on everyone.
>

Patching #new and then using it because you patched it is a ridiculous
suggestion.  That's what #new: is for.  This is about Squeak, not any one
app..  10 is currently "forced" on everyone, and with 92% of
OrderedCollections in trunk over-allocated, a smaller choice might be
better...


> You wouldn't like me to submit a "performance optimization" that changes
> the new default capacity to 100 because my application happened to deal
> with collections of that size frequently and because memory is comparably
> cheap and large nowadays, would you?
>

It's no less arbitrary than 10.  Both guarantee nothing.  At least 1, 2, or
3 guarantees space efficiency, and guarantees to make the API more
definitive.

#new is fuzzy.  The whole reason you're worried about uses of #new being
affected at all is because of that fuzziness.  We should give it clarity,
make it definitively space-efficient...

The core library cannot *possibly* guess the shape of people's domains.
Our attempts to do so are causing more harm than good..


> We can only really know the impact if we have a benchmark or even an idea
> of realistic average collection usage. Maybe someone wrote a paper about
> that...
>

The beginning of a dev cycle is where such a change can be implemented,
leaving plenty of time for testing.

> Couldn't it be faster to use an OrderedCollection instead of a hashed one
for such small numbers of elements? If the hash computation outweighs the
linear search...
As mentioned, this is about Squeak system efficiency and API design, not
any one specific app.

 - Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200129/db806ed1/attachment.html>


More information about the Squeak-dev mailing list