[squeak-dev] The Inbox: Collections-cmm.874.mcz

Thu Jan 30 17:35:11 UTC 2020

Hi Chris,

I still don't see a compelling reason to change #new.

In one paragraph you write about using new: "proactively" and in the next
you write this is not about performance.

My opinion on the API: if I start out with no specific knowledge about the
expected size of my collection or I don't think that is relevant, I use
new. That is 99% of the cases.

I expect that the standard library gives me a default that has proven to
work well in most cases. By "work well" I predominantly think of
time-efficiency, not space-efficiency.

We can look at other such libraries, OpenJDK also uses 10 for ArrayList:
http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/classes/java/util/ArrayList.java#l118

I only use new: when I know that it gives me a better result, after I
noticed that I need a better result. "Proactive" from the start implies
premature optimisation for me.

Also I don't expect that initial allocation with new: will always be
faster. I expect that I can save reallocations later on. That is where the
time is supposed to be saved. If new is faster than new: 1, I am fine with
that. Allocating an Array of 1 or not allocating a Collection at all would
be even better. I expect a benefit from new: mostly with large numbers.

If I needed new: with a small size because I have space problems, then I
wouldn't use hashed collections in the first place. Don't make new possibly
worse because it happens to suit your case, which is possibly special.

We have not seen in this thread any credible, real data on what "most
cases" and "works well" really is, and I have none to give. There is
probably some study or paper about it out there, I don't know. Not having
the numbers seems to imply to you that the choice is arbitrary, or even
random as you called it, but I beg to differ. If we make it worse in the
average but unrevealed case, it is still a regression. A symptom would be
if people after the change felt they have to use new: more often to achieve
acceptable results. Chances are you won't even be able to measure such
cognitive overhead for the users and that's really bad!

Patching new for a specific application is obviously only an option if the
application is deployed alone (Sqeuak only as runtime) as opposed to
installing the application into other images. So for your GraphQL engine it
is not an option, sure. But one can always add another constructor as an
extension method if one has special requirements.

Kind regards,
Jakob

Chris Muller <ma.chris.m at gmail.com> schrieb am Mi., 29. Jan. 2020, 22:59:

> Hi Nicolas,
>
> Thanks for the interesting discussion.  I'd like you to know, I'm not
>>> gung-ho about this change, but do think we should seriously consider it for
>>> the benefit of Squeak.  I think the benefit is real, but deceptive.
>>>
>>> It seems there are two dimensions to the decision:
>>>
>>>   - legacy / compatibility
>>>   - API design / user-expectations
>>>
>>> I do respect your point about legacy, that writing #new has always meant
>>> you get something that can hold up to 10 elements before needing to grow,
>>> instead of only 3.
>>> It "*sounds"* reasonable, but...
>>>
>>> Here are some *certainties*:
>>>
>>>     - Allocating a 3 element array is quicker than allocating 10 element
>>> one.
>>>
>>
>> Hmm not sure about that. Not for single or few objects.
>> Allocating many larger short lived objects will increase the rate of
>> scavenging statistically, but this will be measurable only is allocating
>> massively this kind of objects IMO.
>>
>>     - 3 element Array's take up less memory than 10 element ones
>>>     - consuming more RAM can lead to slowdowns due to paging or GC
>>> overhead
>>>
>>     - In the worst possible case (e.g., doing it over and over and
>>> nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72%
>>> the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]
>>>
>>>
>> Typical Smalltalk objects are short lived.
>> That's why we get a generational garbage collection.
>> So typical usage is unlikely to generate paging.
>> Case of paging can only occur if many of these objects are longer lived
>> (and tenured), which again is not typical.
>> For specific usage, there might be specific optimizations like growing a
>> bit the Eden.
>>
>> Here are some *uncertainties*:
>>>
>>>      - there may be some code somewhere creating many thousands of
>>> OrderedCollections (if it were only a few, it wouldn't be noticed)
>>>      - the many thousands are all created in a very short amount of time
>>> (if it were spread out over time, it wouldn't be noticed)
>>>      - it then stores 7-9 elements in most of the OrderedCollections
>>>      - in spite of all of the above, the author still wrote #new instead
>>> of #new:
>>>
>>>
>> All your analysis is exclusively focused on a specific un-typical usage
>> of the library...
>> You are then trying to bend the general purpose library tothis specific
>> case.
>> IMO this should be handled with a specific optimization.
>>
>
> My proposal to reduce the the defaultSize was never motivated by
> performance.  All of the above is simply a _rebuttal_ to Jakob's and
> Levente's concerns about performance being adversely affected in some code
> somewhere.  You and I are in agreement about the above -- it shows those
> concerns are inflated.
>
> No, MY motivation for this proposal has always been for a better API
> design that can capture _something_, instead of, as you said, "trying to
> bend the general purpose library" to something we think is "reasonable"
> with arbitrary numbers like 10 for OrderedCollection.  I contend there is
> little to no basis for that number, but there CAN be a basis for choosing a
> small number; space efficiency and API clarity.  You mentioned "typical
> usage of the library" above, which I think it's impossible to define, but
> would love to hear your thoughts if there's something more tangibly
> obtainable than space.
>
>
>> I did propose using something like a SmallDictionary that is WAY more
>> optimized for small size than hashed collection.
>>
>
> Which I ignored because it missed the point of the proposal.  See above.
>
>
>>
>>>
>>>> Collections-ul.871, just like the former version, creates dictionaries
>>>> matching those expectations, but Dictionaries returned by the new
>>>> version
>>>> use less memory.
>>>> So, what's the problem?
>>>>
>>>
>>> It slowed down (Dictionary new: 1) relative to trunk,
>>> *by a comparable margin* as adding 9 elements to an (OrderedCollection
>>> new: 3) relative to an (OrderedCollection new: 10
>>> (see [1])
>>>
>>
>> IMO this is typical biased usage of percentages...
>>
> Saving 30% of a short duration or 30% of a long duration is not at all the
>> same thing!
>> The former case is premature optimization presumably unless used in tight
>> loops.
>>
>
> Right, again this was simply addressing Jakob and Levente's opposition to
> the proposal.  They had concerns about performance, these benchmarks were
> meant to allay them for the reason you stated.
>
> As I said, all of *my* motivations are more outwardly focused; the API
> presented by Squeak, and user expectations thereof.  I agree with you that
> these concerns on performance are "premature", and shouldn't stop us from
> seriously considering this.
>
> Best,
>   Chris
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200130/d40392b3/attachment.html>