[squeak-dev] The Inbox: Collections-cmm.874.mcz

Chris Muller ma.chris.m at gmail.com
Wed Jan 29 21:58:20 UTC 2020


Hi Nicolas,

Thanks for the interesting discussion.  I'd like you to know, I'm not
>> gung-ho about this change, but do think we should seriously consider it for
>> the benefit of Squeak.  I think the benefit is real, but deceptive.
>>
>> It seems there are two dimensions to the decision:
>>
>>   - legacy / compatibility
>>   - API design / user-expectations
>>
>> I do respect your point about legacy, that writing #new has always meant
>> you get something that can hold up to 10 elements before needing to grow,
>> instead of only 3.
>> It "*sounds"* reasonable, but...
>>
>> Here are some *certainties*:
>>
>>     - Allocating a 3 element array is quicker than allocating 10 element
>> one.
>>
>
> Hmm not sure about that. Not for single or few objects.
> Allocating many larger short lived objects will increase the rate of
> scavenging statistically, but this will be measurable only is allocating
> massively this kind of objects IMO.
>
>     - 3 element Array's take up less memory than 10 element ones
>>     - consuming more RAM can lead to slowdowns due to paging or GC
>> overhead
>>
>     - In the worst possible case (e.g., doing it over and over and nothing
>> else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed
>> of adding 9 elements to an (OrderedCollection new: 10). see [1]
>>
>>
> Typical Smalltalk objects are short lived.
> That's why we get a generational garbage collection.
> So typical usage is unlikely to generate paging.
> Case of paging can only occur if many of these objects are longer lived
> (and tenured), which again is not typical.
> For specific usage, there might be specific optimizations like growing a
> bit the Eden.
>
> Here are some *uncertainties*:
>>
>>      - there may be some code somewhere creating many thousands of
>> OrderedCollections (if it were only a few, it wouldn't be noticed)
>>      - the many thousands are all created in a very short amount of time
>> (if it were spread out over time, it wouldn't be noticed)
>>      - it then stores 7-9 elements in most of the OrderedCollections
>>      - in spite of all of the above, the author still wrote #new instead
>> of #new:
>>
>>
> All your analysis is exclusively focused on a specific un-typical usage of
> the library...
> You are then trying to bend the general purpose library tothis specific
> case.
> IMO this should be handled with a specific optimization.
>

My proposal to reduce the the defaultSize was never motivated by
performance.  All of the above is simply a _rebuttal_ to Jakob's and
Levente's concerns about performance being adversely affected in some code
somewhere.  You and I are in agreement about the above -- it shows those
concerns are inflated.

No, MY motivation for this proposal has always been for a better API design
that can capture _something_, instead of, as you said, "trying to bend the
general purpose library" to something we think is "reasonable" with
arbitrary numbers like 10 for OrderedCollection.  I contend there is little
to no basis for that number, but there CAN be a basis for choosing a small
number; space efficiency and API clarity.  You mentioned "typical usage of
the library" above, which I think it's impossible to define, but would love
to hear your thoughts if there's something more tangibly obtainable than
space.


> I did propose using something like a SmallDictionary that is WAY more
> optimized for small size than hashed collection.
>

Which I ignored because it missed the point of the proposal.  See above.


>
>>
>>> Collections-ul.871, just like the former version, creates dictionaries
>>> matching those expectations, but Dictionaries returned by the new
>>> version
>>> use less memory.
>>> So, what's the problem?
>>>
>>
>> It slowed down (Dictionary new: 1) relative to trunk,
>> *by a comparable margin* as adding 9 elements to an (OrderedCollection
>> new: 3) relative to an (OrderedCollection new: 10
>> (see [1])
>>
>
> IMO this is typical biased usage of percentages...
>
Saving 30% of a short duration or 30% of a long duration is not at all the
> same thing!
> The former case is premature optimization presumably unless used in tight
> loops.
>

Right, again this was simply addressing Jakob and Levente's opposition to
the proposal.  They had concerns about performance, these benchmarks were
meant to allay them for the reason you stated.

As I said, all of *my* motivations are more outwardly focused; the API
presented by Squeak, and user expectations thereof.  I agree with you that
these concerns on performance are "premature", and shouldn't stop us from
seriously considering this.

Best,
  Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200129/0cbfc97d/attachment.html>


More information about the Squeak-dev mailing list