Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.874.mcz
==================== Summary ====================
Name: Collections-cmm.874 Author: cmm Time: 24 January 2020, 3:32:43.628249 pm UUID: 107f3a74-177c-4338-9ad3-648e427419a1 Ancestors: Collections-ul.871
- Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be. - Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.
=============== Diff against Collections-ul.871 ===============
Item was changed: ----- Method: HashedCollection class>>new (in category 'instance creation') ----- new + ^ self basicNew initialize: 3! - ^ self basicNew initialize: 5!
Hi all,
In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.
((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat "0.11282051282051282"
A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new". The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.
This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.
Best, Chris
On Fri, Jan 24, 2020 at 3:32 PM commits@source.squeak.org wrote:
Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.874.mcz
==================== Summary ====================
Name: Collections-cmm.874 Author: cmm Time: 24 January 2020, 3:32:43.628249 pm UUID: 107f3a74-177c-4338-9ad3-648e427419a1 Ancestors: Collections-ul.871
- Optimize for system compactness by ensuring the default internal array
size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform
comparably with #new even if the minimum size is specified.
=============== Diff against Collections-ul.871 ===============
Item was changed: ----- Method: HashedCollection class>>new (in category 'instance creation') ----- new
^ self basicNew initialize: 3!
^ self basicNew initialize: 5!
Wow, the number of oversized OrderedCollection instances is much worse, 92%.
((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) / OrderedCollection allInstances size) asFloat.
On Fri, Jan 24, 2020 at 4:01 PM Chris Muller asqueaker@gmail.com wrote:
Hi all,
In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.
((Dictionary allInstances count: [ : e | e size < 3 and: [ e array
size >= 5 ] ]) / Dictionary allInstances size) asFloat "0.11282051282051282"
A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new". The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.
This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.
Best, Chris
On Fri, Jan 24, 2020 at 3:32 PM commits@source.squeak.org wrote:
Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.874.mcz
==================== Summary ====================
Name: Collections-cmm.874 Author: cmm Time: 24 January 2020, 3:32:43.628249 pm UUID: 107f3a74-177c-4338-9ad3-648e427419a1 Ancestors: Collections-ul.871
- Optimize for system compactness by ensuring the default internal array
size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform
comparably with #new even if the minimum size is specified.
=============== Diff against Collections-ul.871 ===============
Item was changed: ----- Method: HashedCollection class>>new (in category 'instance creation') ----- new
^ self basicNew initialize: 3!
^ self basicNew initialize: 5!
Note that your statistics do not account for transient collections. If minimizing the initial capacity made most uses of these collections slower, it might not be justified to do so.
Am Fr., 24. Jan. 2020 um 23:28 Uhr schrieb Chris Muller <asqueaker@gmail.com
:
Wow, the number of oversized OrderedCollection instances is much worse, 92%.
((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) / OrderedCollection allInstances size) asFloat.
On Fri, Jan 24, 2020 at 4:01 PM Chris Muller asqueaker@gmail.com wrote:
Hi all,
In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.
((Dictionary allInstances count: [ : e | e size < 3 and: [ e array
size >= 5 ] ]) / Dictionary allInstances size) asFloat "0.11282051282051282"
A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new". The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.
This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.
Best, Chris
On Fri, Jan 24, 2020 at 3:32 PM commits@source.squeak.org wrote:
Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.874.mcz
==================== Summary ====================
Name: Collections-cmm.874 Author: cmm Time: 24 January 2020, 3:32:43.628249 pm UUID: 107f3a74-177c-4338-9ad3-648e427419a1 Ancestors: Collections-ul.871
- Optimize for system compactness by ensuring the default internal array
size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform
comparably with #new even if the minimum size is specified.
=============== Diff against Collections-ul.871 ===============
Item was changed: ----- Method: HashedCollection class>>new (in category 'instance creation') ----- new
^ self basicNew initialize: 3!
^ self basicNew initialize: 5!
Hi Jakob,
Optimal execution performance can *never* be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of *not* *over*-allocating in other places.
Although it's impossible to tweak the default initial size in #new to optimize for *performance*, we can at least optimize for *space*, and also for *clarity-of-usage. *#new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide *compactness *in that case. Places in the code that need optimization will correctly express that by writing #new: with a larger pre-allocation.
It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely.
Best, Chris
On Fri, Jan 24, 2020 at 4:51 PM Jakob Reschke forums.jakob@resfarm.de wrote:
Note that your statistics do not account for transient collections. If minimizing the initial capacity made most uses of these collections slower, it might not be justified to do so.
Am Fr., 24. Jan. 2020 um 23:28 Uhr schrieb Chris Muller < asqueaker@gmail.com>:
Wow, the number of oversized OrderedCollection instances is much worse, 92%.
((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) / OrderedCollection allInstances size) asFloat.
On Fri, Jan 24, 2020 at 4:01 PM Chris Muller asqueaker@gmail.com wrote:
Hi all,
In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.
((Dictionary allInstances count: [ : e | e size < 3 and: [ e array
size >= 5 ] ]) / Dictionary allInstances size) asFloat "0.11282051282051282"
A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new". The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.
This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.
Best, Chris
On Fri, Jan 24, 2020 at 3:32 PM commits@source.squeak.org wrote:
Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.874.mcz
==================== Summary ====================
Name: Collections-cmm.874 Author: cmm Time: 24 January 2020, 3:32:43.628249 pm UUID: 107f3a74-177c-4338-9ad3-648e427419a1 Ancestors: Collections-ul.871
- Optimize for system compactness by ensuring the default internal
array size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and
perform comparably with #new even if the minimum size is specified.
=============== Diff against Collections-ul.871 ===============
Item was changed: ----- Method: HashedCollection class>>new (in category 'instance creation') ----- new
^ self basicNew initialize: 3!
^ self basicNew initialize: 5!
Hi Chris,
On Fri, 24 Jan 2020, Chris Muller wrote:
Hi Jakob,
Optimal execution performance can never be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places.
Although it's impossible to tweak the default initial size in #new to optimize for performance, we can at least optimize for space, and also for clarity-of-usage. #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case. Places in the code that need optimization will correctly express that by writing #new: with a larger pre-allocation.
It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely.
The way I understand it, Jakob says that your suggested change has a chance to slow down things just to save a few hundred kilobytes of memory. I don't think we would want to do that without measuring the effects of the change somehow.
Same applies to OrderedCollections.
Following your reasoning, the optimal initial capacity for any dynamic collection created with #new should be 0, but that doesn't feel right...
Levente
Best, Chris
On Fri, Jan 24, 2020 at 4:51 PM Jakob Reschke forums.jakob@resfarm.de wrote: Note that your statistics do not account for transient collections. If minimizing the initial capacity made most uses of these collections slower, it might not be justified to do so.
Am Fr., 24. Jan. 2020 um 23:28 Uhr schrieb Chris Muller asqueaker@gmail.com: Wow, the number of oversized OrderedCollection instances is much worse, 92%. ((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) / OrderedCollection allInstances size) asFloat.
On Fri, Jan 24, 2020 at 4:01 PM Chris Muller asqueaker@gmail.com wrote: Hi all, In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew. ((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat "0.11282051282051282"
A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new". The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.
This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.
Best, Chris
On Fri, Jan 24, 2020 at 3:32 PM commits@source.squeak.org wrote: Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.874.mcz
==================== Summary ==================== Name: Collections-cmm.874 Author: cmm Time: 24 January 2020, 3:32:43.628249 pm UUID: 107f3a74-177c-4338-9ad3-648e427419a1 Ancestors: Collections-ul.871 - Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be. - Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified. =============== Diff against Collections-ul.871 =============== Item was changed: ----- Method: HashedCollection class>>new (in category 'instance creation') ----- new + ^ self basicNew initialize: 3! - ^ self basicNew initialize: 5!
On Fri, Jan 24, 2020 at 6:19 PM Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Chris,
On Fri, 24 Jan 2020, Chris Muller wrote:
Hi Jakob,
Optimal execution performance can never be assured with #new, since the
potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places.
Although it's impossible to tweak the default initial size in #new to
optimize for performance, we can at least optimize for space, and also for clarity-of-usage. #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case. Places in the code that
need optimization will correctly express that by writing #new: with a
larger pre-allocation.
It's true that system level changes of any kind could result in the need
for downstream changes but, in this case, it seems very unlikely.
The way I understand it, Jakob says that your suggested change has a chance to slow down things just to save a few hundred kilobytes of memory.
But your suggested change is 100% *guaranteed* to slow things down, with no space savings.
This avoids that particular slow down, while guarantee'ing to save space! All for only a very *teency-weency chance* of other areas being minorly affected (and, easily fixed if they are!).
I don't think we would want to do that without measuring the effects of the change somehow.
Performance measuring is already part of every system that cares, right? :) None of those systems are depending on any particular default initial size for performance. If they are, they should be fixed.
Same applies to OrderedCollections.
Following your reasoning, the optimal initial capacity for any dynamic collection created with #new should be 0, but that doesn't feel right...
Not 0, 3. It's "reasoning", not extremism. :)
3 would be a great default for OrderedCollection. Ultra-minimal, but with a purpose. I mean, really, with 10, we have 92% of all instances wasting space.
- Chris
Hi Chris,
On Fri, 24 Jan 2020, Chris Muller wrote:
On Fri, Jan 24, 2020 at 6:19 PM Levente Uzonyi leves@caesar.elte.hu wrote: Hi Chris,
On Fri, 24 Jan 2020, Chris Muller wrote: > Hi Jakob, > > Optimal execution performance can never be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places. > > Although it's impossible to tweak the default initial size in #new to optimize for performance, we can at least optimize for space, and also for clarity-of-usage. #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case. Places in the code that > need optimization will correctly express that by writing #new: with a larger pre-allocation. > > It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely. The way I understand it, Jakob says that your suggested change has a chance to slow down things just to save a few hundred kilobytes of memory.
But your suggested change is 100% *guaranteed* to slow things down, with no space savings.
Which change? To remove the optimization from #new? That wasn't a serious suggestion just a possible solution to your problem (which I don't really consider to be a problem).
This avoids that particular slow down, while guarantee'ing to save space! All for only a very teency-weency chance of other areas being minorly affected (and, easily fixed if they are!).
It's funny that you call it a slowdown. You may perceive it as one but that's just a simple optimization applied to #new, which cannot be directly applied to #new:.
I don't think we would want to do that without measuring the effects of the change somehow.
Performance measuring is already part of every system that cares, right? :)
Of course not. That's why "somehow" is there. I don't know how it could be measured, but I think that without measurements, it's risky to change something like that.
None of those systems are depending on any particular default initial size for performance. If they are, they should be fixed.
Here is an example scenario: I write my code. I use [OrderedCollection new]. I see that my code is fast enough. You change the default capacity from 10 to 3. My code is now too slow. I have to profile it to see why. It turns out that I store 7-9 elements most of the time, and the capacity of 10 was a good fit, but 3 is not, because it means growing twice (first to 6, then to 12), and my code ends up being slower and using more memory than before.
Same applies to OrderedCollections. Following your reasoning, the optimal initial capacity for any dynamic collection created with #new should be 0, but that doesn't feel right...
Not 0, 3. It's "reasoning", not extremism. :)
3 would be a great default for OrderedCollection. Ultra-minimal, but with a purpose. I mean, really, with 10, we have 92% of all instances wasting space.
"Wasting space" is a must when you implement dynamic arrays and hash tables. It's a space-time tradeoff[1].
Levente
[1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff
- Chris
I don't think we would want to do that without measuring the
effects of
the change somehow.
Performance measuring is already part of every system that cares,
right? :)
Of course not. That's why "somehow" is there. I don't know how it could be measured, but I think that without measurements, it's risky to change something like that.
You should be careful, Levente, Collections-ul.871 does it, too. Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3. And, yes, it did indeed introduce this unexpected anomaly!
None of those systems are depending on any particular default initial
size for performance. If they are, they should be fixed.
Here is an example scenario: I write my code. I use [OrderedCollection new]. I see that my code is fast enough. You change the default capacity from 10 to 3. My code is now too slow. I have to profile it to see why. It turns out that I store 7-9 elements most of the time, and the capacity of 10 was a good fit, but 3 is not, because it means growing twice (first to 6, then to 12), and my code ends up being slower and using more memory than before.
This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance. By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be. No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.
More importantly, to me, it brings clear, definitive responsibility to #new: over #new. *Efficiency*.
Same applies to OrderedCollections. Following your reasoning, the optimal initial capacity for any
dynamic
collection created with #new should be 0, but that doesn't feel
right...
Not 0, 3. It's "reasoning", not extremism. :)
3 would be a great default for OrderedCollection. Ultra-minimal, but
with a purpose. I mean, really, with 10, we have 92% of all instances wasting space.
"Wasting space" is a must when you implement dynamic arrays and hash tables. It's a space-time tradeoff[1].
OrderedCollection isn't hashed.
If we want have a sliding-scale trade-off between space and performance, so that "Dictionary new" everywhere in the code could be treated to run time or space-efficiently, that'd be something I could respect. We'd need to introduce some kind a variable or PoolDictionary constant to allow a developer to customize, for the entire image, all cases of "Dictionary new" to run space-efficiently (e.g., on a 1GB RPi) or time-efficiently.
But we should not break #new: 1 and #new: 2 for HashedCollections, please.
- Chris
Levente
[1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff
Am Sa., 25. Jan. 2020 um 06:55 Uhr schrieb Chris Muller < ma.chris.m@gmail.com>:
Here is an example scenario:
I write my code. I use [OrderedCollection new]. I see that my code is fast
enough. You change the default capacity from 10 to 3. My code is now too slow. I have to profile it to see why. It turns out that I store 7-9 elements most of the time, and the capacity of 10 was a good fit, but 3 is not, because it means growing twice (first to 6, then to 12), and my code ends up being slower and using more memory than before.
This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance. By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be. No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.
If you optimize the default for space instead of sticking with a reasonable tradeoff, you might force people to use new: and think about the very implementation details of those collections to get back to reasonable results. You might turn a piece of code into a bottleneck even though it was not considered performance-critical before. So you may break/brake existing applications in a non-obvious manner and cause maintenance cost. And free time is also precious...
On the other hand, who else was bothered by too sparse hashed or ordered collections until now? Is it a problem that bothers many, in comparison to the group which the change could bother?
I suppose this is premature optimization. If people have identified compactness as a requirement, they shall use #new: with (domain specific) expected numbers or patch #new for their application. But don't force it on everyone. You wouldn't like me to submit a "performance optimization" that changes the new default capacity to 100 because my application happened to deal with collections of that size frequently and because memory is comparably cheap and large nowadays, would you?
We can only really know the impact if we have a benchmark or even an idea of realistic average collection usage. Maybe someone wrote a paper about that...
Hi Jakob,
My main reply is in the other one to you and Levente, but some quick responses here for minor embellishment. :)
On Sat, Jan 25, 2020 at 3:21 AM Jakob Reschke forums.jakob@resfarm.de wrote:
Am Sa., 25. Jan. 2020 um 06:55 Uhr schrieb Chris Muller < ma.chris.m@gmail.com>:
Here is an example scenario:
I write my code. I use [OrderedCollection new]. I see that my code is
fast enough. You change the default capacity from 10 to 3. My code is now too slow. I have to profile it to see why. It turns out that I store 7-9 elements most of the time, and the capacity of 10 was a good fit, but 3 is not, because it means growing twice (first to 6, then to 12), and my code ends up being slower and using more memory than before.
This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance. By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be. No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.
If you optimize the default for space instead of sticking with a reasonable tradeoff,
"reasonable tradeoff" is what I'm trying to convince you is completely baseless.
you might force people to use new: and think about the very implementation details of those collections to get back to reasonable results.
Its no different than we have now. Thinking about the size wherever you can is a good thing.
Your fear of changing #new is because of the fuzziness of its definition. What you're calling "reasonable" is actually just "random". If it were definitive (e.g., space-efficient), the impact of changing it would be, too.
You might turn a piece of code into a bottleneck even though it was not
considered performance-critical before.
Or it might rescue a suffering application because it's no longer paging RAM out to disk... :)
On the other hand, who else was bothered by too sparse hashed or ordered
collections until now?
It's about designing the most-efficient system and the best API, not who has been bothered yet.
Is it a problem that bothers many, in comparison to the group which the change could bother?
What happens to that group when they move their code to another Smalltalk which uses a different default?
I suppose this is premature optimization. If people have identified compactness as a requirement,
When all else is equal, more compact is *always* better than less.
they shall use #new: with (domain specific) expected numbers or patch #new for their application. But don't force it on everyone.
Patching #new and then using it because you patched it is a ridiculous suggestion. That's what #new: is for. This is about Squeak, not any one app.. 10 is currently "forced" on everyone, and with 92% of OrderedCollections in trunk over-allocated, a smaller choice might be better...
You wouldn't like me to submit a "performance optimization" that changes the new default capacity to 100 because my application happened to deal with collections of that size frequently and because memory is comparably cheap and large nowadays, would you?
It's no less arbitrary than 10. Both guarantee nothing. At least 1, 2, or 3 guarantees space efficiency, and guarantees to make the API more definitive.
#new is fuzzy. The whole reason you're worried about uses of #new being affected at all is because of that fuzziness. We should give it clarity, make it definitively space-efficient...
The core library cannot *possibly* guess the shape of people's domains. Our attempts to do so are causing more harm than good..
We can only really know the impact if we have a benchmark or even an idea of realistic average collection usage. Maybe someone wrote a paper about that...
The beginning of a dev cycle is where such a change can be implemented, leaving plenty of time for testing.
Couldn't it be faster to use an OrderedCollection instead of a hashed one
for such small numbers of elements? If the hash computation outweighs the linear search... As mentioned, this is about Squeak system efficiency and API design, not any one specific app.
- Chris
On 25.01.2020, at 10:23, Jakob Reschke forums.jakob@resfarm.de wrote:
But we should not break #new: 1 and #new: 2 for HashedCollections, please.
Couldn't it be faster to use an OrderedCollection instead of a hashed one for such small numbers of elements? If the hash computation outweighs the linear search...
As I said. This is just too much premature optimization of everything. It worked well before. If you need really small key-value-mappings, you should make one yourself.
Let's just leave the current implementation alone, please. It is a-ok.
Best regards -Tobias
Let's just leave the current implementation alone, please. It is a-ok.
+1
/*—————————————————-*/ Sent from my iPhone https://boincstats.com/signature/-1/user/51616339056/sig.png See https://objectnets.net and https://objectnets.org https://datascilv.com and https://datascilv.org
On Jan 25, 2020, at 02:50, Tobias Pape Das.Linux@gmx.de wrote:
On 25.01.2020, at 10:23, Jakob Reschke forums.jakob@resfarm.de wrote:
But we should not break #new: 1 and #new: 2 for HashedCollections, please.
Couldn't it be faster to use an OrderedCollection instead of a hashed one for such small numbers of elements? If the hash computation outweighs the linear search...
As I said. This is just too much premature optimization of everything. It worked well before. If you need really small key-value-mappings, you should make one yourself.
Let's just leave the current implementation alone, please. It is a-ok.
Best regards -Tobias
Hi Chris,
On Fri, 24 Jan 2020, Chris Muller wrote:
> > I don't think we would want to do that without measuring the effects of > the change somehow. > > > Performance measuring is already part of every system that cares, right? :) Of course not. That's why "somehow" is there. I don't know how it could be measured, but I think that without measurements, it's risky to change something like that.
You should be careful, Levente, Collections-ul.871 does it, too. Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3. And, yes, it did indeed introduce this unexpected anomaly!
When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to get a dictionary that can hold one or two elements, respectively. Collections-ul.871, just like the former version, creates dictionaries matching those expectations, but Dictionaries returned by the new version use less memory. So, what's the problem?
> None of those systems are depending on any particular default initial size for performance. If they are, they should be fixed. Here is an example scenario: I write my code. I use [OrderedCollection new]. I see that my code is fast enough. You change the default capacity from 10 to 3. My code is now too slow. I have to profile it to see why. It turns out that I store 7-9 elements most of the time, and the capacity of 10 was a good fit, but 3 is not, because it means growing twice (first to 6, then to 12), and my code ends up being slower and using more memory than before.
This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance. By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be. No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.
No, the example assumes that the user knows nothing about the internals of Dictionary. The performance of the example depends on the inital capacity. How often do you investigate the internals of classes your code uses to avoid relying on the default values coded into them when there are no performance problems with your code? I guess never...
More importantly, to me, it brings clear, definitive responsibility to #new: over #new. Efficiency.
As I wrote it many times in this thread, #new: already gives you efficiency over #new, because it lets you avoid growing. The optimization in HashedCollection class >> #new is what makes you think otherwise. The problem stems from two things: - you doing microbenchmarks on #new: and #new, ignoring the costs of filling up the collections - the simple optimization in #new
> > > Same applies to OrderedCollections. > > Following your reasoning, the optimal initial capacity for any dynamic > collection created with #new should be 0, but that doesn't feel right... > > > Not 0, 3. It's "reasoning", not extremism. :) > > 3 would be a great default for OrderedCollection. Ultra-minimal, but with a purpose. I mean, really, with 10, we have 92% of all instances wasting space. "Wasting space" is a must when you implement dynamic arrays and hash tables. It's a space-time tradeoff[1].
OrderedCollection isn't hashed.
OrderedCollection is a dynamic array[1].
If we want have a sliding-scale trade-off between space and performance, so that "Dictionary new" everywhere in the code could be treated to run time or space-efficiently, that'd be something I could respect. We'd need to introduce some kind a variable or PoolDictionary constant to allow a developer to
I don't think we want anything like that.
customize, for the entire image, all cases of "Dictionary new" to run space-efficiently (e.g., on a 1GB RPi) or time-efficiently.
But we should not break #new: 1 and #new: 2 for HashedCollections, please.
The performance difference between #new, and #new: X where X is <= 3, didn't bother you (or anyone else) until recently. I proposed all kinds of solutions to minimize the difference between #new and #new:, and we seemed to agree on one of those.
Levente
[1] https://en.wikipedia.org/wiki/Dynamic_array
- Chris
Levente [1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff
Hi Levente, hi Jakob,
Thanks for the interesting discussion. I'd like you to know, I'm not gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak. I think the benefit is real, but deceptive.
It seems there are two dimensions to the decision:
- legacy / compatibility - API design / user-expectations
I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3. It "*sounds"* reasonable, but...
Here are some *certainties*:
- Allocating a 3 element array is quicker than allocating 10 element one. - 3 element Array's take up less memory than 10 element ones - consuming more RAM can lead to slowdowns due to paging or GC overhead - In the worst possible case (e.g., doing it over and over and nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]
Here are some *uncertainties*:
- there may be some code somewhere creating many thousands of OrderedCollections (if it were only a few, it wouldn't be noticed) - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed) - it then stores 7-9 elements in most of the OrderedCollections - in spite of all of the above, the author still wrote #new instead of #new:
Letting fear of such remote uncertainties dominate your decision even in light of those certainties brings paralysis. Squeak can hardly improve if such low-risk items can't even be attempted at the beginning of a development cycle, when applications will surely undergo testing before deploying to the next version of Squeak (5.4). Our community is small (low chance) and helpful (low impact). ___ That leaves the API discussion, which was mostly ignored, but is really the key. I would like to touch on this with some responses, below:
You should be careful, Levente, Collections-ul.871 does it, too.
Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3. And, yes, it did indeed introduce this unexpected anomaly!
When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to get a dictionary that can hold one or two elements, respectively.
How many should one "expect" to get with #new? Because one reason you gave for not wanting to reduce the default size seems to be based on some "expectation" you have for it.
There should be only one: that it shouldn't be significantly faster than #new: 1. *Nothing else*.
Collections-ul.871, just like the former version, creates dictionaries matching those expectations, but Dictionaries returned by the new version use less memory. So, what's the problem?
It slowed down (Dictionary new: 1) relative to trunk, *by a comparable margin* as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 (see [1])
How often do you investigate the internals of classes your code uses to
avoid relying on the default values coded into them
The answer is already "never" right here...
when there are no performance problems with your code? I guess never...
... but I still want to write "optimized" code even _before_ having any "performance problems." Proactivity. :)
But we should not break #new: 1 and #new: 2 for HashedCollections,
please.
The performance difference between #new, and #new: X where X is <= 3, didn't bother you (or anyone else) until recently.
Because the problem is in proposal Collections-ul.871, not trunk.
I proposed all kinds of solutions to minimize the difference between #new and #new:, and we seemed to agree on one of those.
Cool, let's do one of those, thanks!
- Chris
[1] -- (adding 9 elements to OrderedCollection new: 10) 100% of baseline rate, 3,860,000 per second. 259 nanoseconds per run. 2.88 % GC time.'
(adding 9 elements to OrderedCollection new: 3) 72% of baseline rate, 2,790,000 per second. 359 nanoseconds per run. 3.55929 % GC time.
(Dictionary new: 1) in trunk 21,800,000 per second. 45.9 nanoseconds per run. 13.8 % GC time.
(Dicitionary new: 1) with Collections-ul.871 '16,300,000 per second. 61.4 nanoseconds per run. 5.05899 % GC time.'
Hi Chris,
Le mer. 29 janv. 2020 à 09:39, Chris Muller asqueaker@gmail.com a écrit :
Hi Levente, hi Jakob,
Thanks for the interesting discussion. I'd like you to know, I'm not gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak. I think the benefit is real, but deceptive.
It seems there are two dimensions to the decision:
- legacy / compatibility
- API design / user-expectations
I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3. It "*sounds"* reasonable, but...
Here are some *certainties*:
- Allocating a 3 element array is quicker than allocating 10 element
one.
Hmm not sure about that. Not for single or few objects. Allocating many larger short lived objects will increase the rate of scavenging statistically, but this will be measurable only is allocating massively this kind of objects IMO.
- 3 element Array's take up less memory than 10 element ones
- consuming more RAM can lead to slowdowns due to paging or GC
overhead
- In the worst possible case (e.g., doing it over and over and nothing
else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]
Typical Smalltalk objects are short lived. That's why we get a generational garbage collection. So typical usage is unlikely to generate paging. Case of paging can only occur if many of these objects are longer lived (and tenured), which again is not typical. For specific usage, there might be specific optimizations like growing a bit the Eden.
Here are some *uncertainties*:
- there may be some code somewhere creating many thousands of
OrderedCollections (if it were only a few, it wouldn't be noticed) - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed) - it then stores 7-9 elements in most of the OrderedCollections - in spite of all of the above, the author still wrote #new instead of #new:
All your analysis is exclusively focused on a specific un-typical usage of the library... You are then trying to bend the general purpose library tothis specific case. IMO this should be handled with a specific optimization.
I did propose using something like a SmallDictionary that is WAY more optimized for small size than hashed collection. If not all your dictionary are small, you could have a subclass that switch (becomeForward:) representation when needing to grow.
Letting fear of such remote uncertainties dominate your decision even in
light of those certainties brings paralysis. Squeak can hardly improve if such low-risk items can't even be attempted at the beginning of a development cycle, when applications will surely undergo testing before deploying to the next version of Squeak (5.4). Our community is small (low chance) and helpful (low impact). ___ That leaves the API discussion, which was mostly ignored, but is really the key. I would like to touch on this with some responses, below:
You should be careful, Levente, Collections-ul.871 does it, too.
Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3. And, yes, it did indeed introduce this unexpected anomaly!
When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to get a dictionary that can hold one or two elements, respectively.
How many should one "expect" to get with #new? Because one reason you gave for not wanting to reduce the default size seems to be based on some "expectation" you have for it.
There should be only one: that it shouldn't be significantly faster than #new: 1. *Nothing else*.
Collections-ul.871, just like the former version, creates dictionaries matching those expectations, but Dictionaries returned by the new version use less memory. So, what's the problem?
It slowed down (Dictionary new: 1) relative to trunk, *by a comparable margin* as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 (see [1])
IMO this is typical biased usage of percentages... Saving 30% of a short duration or 30% of a long duration is not at all the same thing! The former case is premature optimization presumably unless used in tight loops.
How often do you investigate the internals of classes your code uses to
avoid relying on the default values coded into them
The answer is already "never" right here...
when there are no performance problems with your code? I guess never...
... but I still want to write "optimized" code even _before_ having any "performance problems." Proactivity. :)
But we should not break #new: 1 and #new: 2 for HashedCollections,
please.
The performance difference between #new, and #new: X where X is <= 3, didn't bother you (or anyone else) until recently.
Because the problem is in proposal Collections-ul.871, not trunk.
I proposed all kinds of solutions to minimize the difference between #new and #new:, and we seemed to agree on one of those.
Cool, let's do one of those, thanks!
- Chris
[1] -- (adding 9 elements to OrderedCollection new: 10) 100% of baseline rate, 3,860,000 per second. 259 nanoseconds per run. 2.88 % GC time.'
(adding 9 elements to OrderedCollection new: 3) 72% of baseline rate, 2,790,000 per second. 359
nanoseconds per run. 3.55929 % GC time.
(Dictionary new: 1) in trunk 21,800,000 per second. 45.9 nanoseconds per run. 13.8 % GC
time.
(Dicitionary new: 1) with Collections-ul.871 '16,300,000 per second. 61.4 nanoseconds per run. 5.05899
% GC time.'
Hi Nicolas,
Thanks for the interesting discussion. I'd like you to know, I'm not
gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak. I think the benefit is real, but deceptive.
It seems there are two dimensions to the decision:
- legacy / compatibility
- API design / user-expectations
I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3. It "*sounds"* reasonable, but...
Here are some *certainties*:
- Allocating a 3 element array is quicker than allocating 10 element
one.
Hmm not sure about that. Not for single or few objects. Allocating many larger short lived objects will increase the rate of scavenging statistically, but this will be measurable only is allocating massively this kind of objects IMO.
- 3 element Array's take up less memory than 10 element ones
- consuming more RAM can lead to slowdowns due to paging or GC
overhead
- In the worst possible case (e.g., doing it over and over and nothing
else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]
Typical Smalltalk objects are short lived. That's why we get a generational garbage collection. So typical usage is unlikely to generate paging. Case of paging can only occur if many of these objects are longer lived (and tenured), which again is not typical. For specific usage, there might be specific optimizations like growing a bit the Eden.
Here are some *uncertainties*:
- there may be some code somewhere creating many thousands of
OrderedCollections (if it were only a few, it wouldn't be noticed) - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed) - it then stores 7-9 elements in most of the OrderedCollections - in spite of all of the above, the author still wrote #new instead of #new:
All your analysis is exclusively focused on a specific un-typical usage of the library... You are then trying to bend the general purpose library tothis specific case. IMO this should be handled with a specific optimization.
My proposal to reduce the the defaultSize was never motivated by performance. All of the above is simply a _rebuttal_ to Jakob's and Levente's concerns about performance being adversely affected in some code somewhere. You and I are in agreement about the above -- it shows those concerns are inflated.
No, MY motivation for this proposal has always been for a better API design that can capture _something_, instead of, as you said, "trying to bend the general purpose library" to something we think is "reasonable" with arbitrary numbers like 10 for OrderedCollection. I contend there is little to no basis for that number, but there CAN be a basis for choosing a small number; space efficiency and API clarity. You mentioned "typical usage of the library" above, which I think it's impossible to define, but would love to hear your thoughts if there's something more tangibly obtainable than space.
I did propose using something like a SmallDictionary that is WAY more optimized for small size than hashed collection.
Which I ignored because it missed the point of the proposal. See above.
Collections-ul.871, just like the former version, creates dictionaries matching those expectations, but Dictionaries returned by the new version use less memory. So, what's the problem?
It slowed down (Dictionary new: 1) relative to trunk, *by a comparable margin* as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 (see [1])
IMO this is typical biased usage of percentages...
Saving 30% of a short duration or 30% of a long duration is not at all the
same thing! The former case is premature optimization presumably unless used in tight loops.
Right, again this was simply addressing Jakob and Levente's opposition to the proposal. They had concerns about performance, these benchmarks were meant to allay them for the reason you stated.
As I said, all of *my* motivations are more outwardly focused; the API presented by Squeak, and user expectations thereof. I agree with you that these concerns on performance are "premature", and shouldn't stop us from seriously considering this.
Best, Chris
Hi Chris,
I still don't see a compelling reason to change #new.
In one paragraph you write about using new: "proactively" and in the next you write this is not about performance.
My opinion on the API: if I start out with no specific knowledge about the expected size of my collection or I don't think that is relevant, I use new. That is 99% of the cases.
I expect that the standard library gives me a default that has proven to work well in most cases. By "work well" I predominantly think of time-efficiency, not space-efficiency.
We can look at other such libraries, OpenJDK also uses 10 for ArrayList: http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/cla...
I only use new: when I know that it gives me a better result, after I noticed that I need a better result. "Proactive" from the start implies premature optimisation for me.
Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on. That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that. Allocating an Array of 1 or not allocating a Collection at all would be even better. I expect a benefit from new: mostly with large numbers.
If I needed new: with a small size because I have space problems, then I wouldn't use hashed collections in the first place. Don't make new possibly worse because it happens to suit your case, which is possibly special.
We have not seen in this thread any credible, real data on what "most cases" and "works well" really is, and I have none to give. There is probably some study or paper about it out there, I don't know. Not having the numbers seems to imply to you that the choice is arbitrary, or even random as you called it, but I beg to differ. If we make it worse in the average but unrevealed case, it is still a regression. A symptom would be if people after the change felt they have to use new: more often to achieve acceptable results. Chances are you won't even be able to measure such cognitive overhead for the users and that's really bad!
Patching new for a specific application is obviously only an option if the application is deployed alone (Sqeuak only as runtime) as opposed to installing the application into other images. So for your GraphQL engine it is not an option, sure. But one can always add another constructor as an extension method if one has special requirements.
Kind regards, Jakob
Chris Muller ma.chris.m@gmail.com schrieb am Mi., 29. Jan. 2020, 22:59:
Hi Nicolas,
Thanks for the interesting discussion. I'd like you to know, I'm not
gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak. I think the benefit is real, but deceptive.
It seems there are two dimensions to the decision:
- legacy / compatibility
- API design / user-expectations
I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3. It "*sounds"* reasonable, but...
Here are some *certainties*:
- Allocating a 3 element array is quicker than allocating 10 element
one.
Hmm not sure about that. Not for single or few objects. Allocating many larger short lived objects will increase the rate of scavenging statistically, but this will be measurable only is allocating massively this kind of objects IMO.
- 3 element Array's take up less memory than 10 element ones
- consuming more RAM can lead to slowdowns due to paging or GC
overhead
- In the worst possible case (e.g., doing it over and over and
nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]
Typical Smalltalk objects are short lived. That's why we get a generational garbage collection. So typical usage is unlikely to generate paging. Case of paging can only occur if many of these objects are longer lived (and tenured), which again is not typical. For specific usage, there might be specific optimizations like growing a bit the Eden.
Here are some *uncertainties*:
- there may be some code somewhere creating many thousands of
OrderedCollections (if it were only a few, it wouldn't be noticed) - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed) - it then stores 7-9 elements in most of the OrderedCollections - in spite of all of the above, the author still wrote #new instead of #new:
All your analysis is exclusively focused on a specific un-typical usage of the library... You are then trying to bend the general purpose library tothis specific case. IMO this should be handled with a specific optimization.
My proposal to reduce the the defaultSize was never motivated by performance. All of the above is simply a _rebuttal_ to Jakob's and Levente's concerns about performance being adversely affected in some code somewhere. You and I are in agreement about the above -- it shows those concerns are inflated.
No, MY motivation for this proposal has always been for a better API design that can capture _something_, instead of, as you said, "trying to bend the general purpose library" to something we think is "reasonable" with arbitrary numbers like 10 for OrderedCollection. I contend there is little to no basis for that number, but there CAN be a basis for choosing a small number; space efficiency and API clarity. You mentioned "typical usage of the library" above, which I think it's impossible to define, but would love to hear your thoughts if there's something more tangibly obtainable than space.
I did propose using something like a SmallDictionary that is WAY more optimized for small size than hashed collection.
Which I ignored because it missed the point of the proposal. See above.
Collections-ul.871, just like the former version, creates dictionaries matching those expectations, but Dictionaries returned by the new version use less memory. So, what's the problem?
It slowed down (Dictionary new: 1) relative to trunk, *by a comparable margin* as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 (see [1])
IMO this is typical biased usage of percentages...
Saving 30% of a short duration or 30% of a long duration is not at all the
same thing! The former case is premature optimization presumably unless used in tight loops.
Right, again this was simply addressing Jakob and Levente's opposition to the proposal. They had concerns about performance, these benchmarks were meant to allay them for the reason you stated.
As I said, all of *my* motivations are more outwardly focused; the API presented by Squeak, and user expectations thereof. I agree with you that these concerns on performance are "premature", and shouldn't stop us from seriously considering this.
Best, Chris
Hi Jakob,
I still don't see a compelling reason to change #new.
Levente's proposal changes it. That's what sparked this whole conversation...
In one paragraph you write about using new: "proactively" and in the next you write this is not about performance.
The "proactively" was only in response to Nicolas and you talking about "premature optimization", which was wrong. When you say:
I only use new: when I know that it gives me a better result, after I
noticed that I need a better result. "Proactive" from the start implies premature optimisation for me.
it's like saying, "Writing Dictionary>>#at: optimally before my application suffers performance problems is 'premature optimization'." Totally wrong.
My opinion on the API: if I start out with no specific knowledge about the expected size of my collection or I don't think that is relevant, I use new. That is 99% of the cases.
Those cases are irrelevant. The cases we're discussing are the ones when the size is known at runtime.
I expect that the standard library gives me a default that has proven to work well in most cases. By "work well" I predominantly think of time-efficiency, not space-efficiency.
That sounds personal. One thing that may help is what I suggested to Levente, try writing a manpage-quality comment for #new: explaining all the nuances... when to use it, when not to, etc... My hope that exercise would illuminate the issue I have.
We can look at other such libraries, OpenJDK also uses 10 for ArrayList: http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/cla...
Java based its core library off Smalltalk, Jakob..
Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on.
As I said before, not if you know your Dictionary will never grow. This is not app-specific.
That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that.
This thread is about reducing the defaultSize. The other thread is about #new:1 needing to be same speed as #new. They're two separate discussions. An API design that forces developers to trade one optimization for another is bad design, plain and simple. Levente's other changes in Collections-ul are good, but to introduce a needless API regression is pointless, and not okay. We would need one solution or the other.
- Chris
Chris Muller asqueaker@gmail.com schrieb am Di., 4. Feb. 2020, 04:39:
I still don't see a compelling reason to change #new.
Levente's proposal changes it. That's what sparked this whole conversation...
Your commit at the top of this thread changes the default capacity and that is what we are writing about.
When you say:
I only use new: when I know that it gives me a better result, after I
noticed that I need a better result. "Proactive" from the start implies premature optimisation for me.
it's like saying, "Writing Dictionary>>#at: optimally before my application suffers performance problems is 'premature optimization'." Totally wrong.
I think the analogy is flawed. There is a space/time tradeoff with the size of the preallocated array. I see no such thing with Dictionary>>at:. Also new allows its user to defer a decision/analysis, at: does not.
My opinion on the API: if I start out with no specific knowledge about the expected size of my collection or I don't think that is relevant, I use new. That is 99% of the cases.
Those cases are irrelevant. The cases we're discussing are the ones when the size is known at runtime.
No, not at all! They are relevant since we are discussing about new and the default capacity, not new:.
Also you saying 99% of my cases are not worthy of the performance they used to have... does not sound appealing to me.
try writing a manpage-quality comment for #new: explaining all the
nuances... when to use it, when not to, etc... My hope that exercise would illuminate the issue I have.
"Allocate a new instance of me with the given initial capacity. Use new: instead of new if you want to avoid either unused capacity or repeated growing (and thus, copying) of the collection's memory."
Not quite man page length, but seems adequate to cover my expectations.
We can look at other such libraries, OpenJDK also uses 10 for ArrayList: http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/cla...
Java based its core library off Smalltalk, Jakob..
What is that supposed to mean? Do you think the Java designers renamed almost all of the classes and methods, rewrote all of the documentation, but forgot to think about changing the initial capacity of ArrayList ex OrderedCollection?
In fact they did change something about it in Java 8 as I found out after my last post: initially the array is empty (singleton), but it still allocates 10 slots once the first element is added. This optimizes space for empty collections, but not small ones.
Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on.
As I said before, not if you know your Dictionary will never grow. This is not app-specific.
If I use new: to optimize performance I am fairly certain that it won't grow afterwards. My original statement still stands.
That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that.
This thread is about reducing the defaultSize. The other thread is about #new:1 needing to be same speed as #new. They're two separate discussions. An API design that forces developers to trade one optimization for another is bad design, plain and simple.
Then the only sensible solution would be to deprecate and remove new for preallocating collections altogether. Forcing everyone to always think ahead of their collection sizes. For collections that are supposed to grow automatically. Slowing down developent of even non-performance-critical parts. I don't want that either.
Changing the current behavior can make it worse for existing applications (again: we lack the numbers to be sure of the contrary). Keeping the current behavior is bad with regards to your concerns. I say: let's therefore not change it, since user code has already been written and most of it is unlikely to be improved again.
try writing a manpage-quality comment for #new: explaining all the
nuances... when to use it, when not to, etc... My hope that exercise would illuminate the issue I have.
"Allocate a new instance of me with the given initial capacity. Use new: instead of new if you want to avoid either unused capacity or repeated growing (and thus, copying) of the collection's memory."
The goal of using #new: is efficiency in general, space OR time (or, both). So w.r.t. the case of optimizing for time, you forgot to mention the 40% performance hit they would take if the size happened to be 1 or 2...
Not quite man page length, but seems adequate to cover my expectations.
Make a spreadsheet with the cartesian product of the following, one per row, and then manually fill out an extra column whether you'd write #new or #new:.
(knows size | does't know size) * (knows at runtime | knows at compile time) * (will possibly grow | won't possibly grow) * ( ... | ...)
When designing a class libary, please think in these general terms and consider *all cases*, instead of only your own expectations.
Also I don't expect that initial allocation with new: will always be
faster. I expect that I can save reallocations later on.
As I said before, not if you know your Dictionary will never grow. This is not app-specific.
If I use new: to optimize performance I am fairly certain that it won't grow afterwards. My original statement still stands.
You said "fairly" certain, now please consider the case where it's *certain* never to grow (perhaps row 187 on the spreadsheet, above). I gave you a common, concrete example in an earlier post. That's the case where your incomplete manpage comment (under Collections-ul.871) could easily trick the developer into hurting her apps performance when she thought she was helping it. You argued (with flawed logic) that people would have to "think about the sizes all the time" but that's exactly what this would introduce, except in a much more insidious way! Because it affects negatively depending on the *runtime parameter* value passed to #new:.
That is where the time is supposed to be saved. If new is faster than new:
1, I am fine with that.
You say you're "fine" with that *certain* performance hit, but you're not fine with this uncertain one whose worst-case possibility was measured to be an equal performance cost. *And*, you're even willing to trade away certainty of space-efficiency, too.
This thread is about reducing the defaultSize. The other thread is about
#new:1 needing to be same speed as #new. They're two separate discussions. An API design that forces developers to trade one optimization for another is bad design, plain and simple.
Then the only sensible solution would be to deprecate and remove new for preallocating collections altogether. Forcing everyone to always think ahead of their collection sizes.
Sigh. With statements like the above, I wonder whether you're actually interested in finding a solution or simply arguing... :(
For collections that are supposed to grow automatically. Slowing down developent of even non-performance-critical parts. I don't want that either.
This was your best argument, but it only takes about 10 seconds to understand its logical flaw -- if you're writing performance-critical code, you're going to write #new: anyway. If you're not, an extra grow from writing #new won't affect performance.
Changing the current behavior can make it worse for existing applications
(again: we lack the numbers to be sure of the contrary).
The worst-case scenario was benched and provided for you, and shown to be a similar effect as Collections-ul.871 on #new: 1.
Keeping the current behavior is bad with regards to your concerns.
As I said multiple times, the "current behavior" is *fine* to my concerns, Jakob, it was Levente's proposal which introduced them. This was just one of two different solutions proposed, but you handily rejected them both, even while offering zero alternatives or even acknowledgement that my concerns are valid. I'm going to cut my losses from this discussion now. Your arguments seem like you either aren't understanding mine, or aren't willing to look for consensus.
Levente has committed Collections-ul.872 with only his other changes, e.g., excluding the one which introduced my concern. He's a gentleman. It's sad that our inability to resolve this petty squabble kept back his other improvements, which would've included the ability to create minimally small Dictionary's with a minimum internal array size of 3 instead of 5. :(
- Chris
This conversation might well get a prize for lowest signal to noise ratio in the entire history of the Internet.
It began with a proposed change that would have required toggling two bits in the source code, which is close to a theoritical minumum for information content:
- Current code: '^ self basicNew initialize: 5' - Proposed change" '^ self basicNew initialize: 3'
This requires toggling two bits in a single ascii character: ($5 asciiValue bitXor: $3 asciiValue) bitCount ==> 2
Changing two bits in the source code of Squeak comes impressively close to being the smallest thoeretically possible change, and these two bit-twiddles now form the basis of a vast email thread that has evolved to include an impressive mix of faulty assumptions, untested metrics, passionate opinions, and invalid conclusions.
It might seem easy to suggest moving Collections-cmm.874 to the treated inbox - after all, everyone who reviewed it gave it a solid thumbs down - but that would spoil the fun. Let's keep this thing going as long as we can, and in the end we can submit the thread for inclusion in the Guinness book of world records.
;-)
Dave
On Wed, Feb 05, 2020 at 04:39:09PM -0600, Chris Muller wrote:
try writing a manpage-quality comment for #new: explaining all the
nuances... when to use it, when not to, etc... My hope that exercise would illuminate the issue I have.
"Allocate a new instance of me with the given initial capacity. Use new: instead of new if you want to avoid either unused capacity or repeated growing (and thus, copying) of the collection's memory."
The goal of using #new: is efficiency in general, space OR time (or, both). So w.r.t. the case of optimizing for time, you forgot to mention the 40% performance hit they would take if the size happened to be 1 or 2...
Not quite man page length, but seems adequate to cover my expectations.
Make a spreadsheet with the cartesian product of the following, one per row, and then manually fill out an extra column whether you'd write #new or #new:.
(knows size | does't know size) * (knows at runtime | knows at compile time) * (will possibly grow | won't possibly grow) * ( ... | ...)
When designing a class libary, please think in these general terms and consider *all cases*, instead of only your own expectations.
Also I don't expect that initial allocation with new: will always be
faster. I expect that I can save reallocations later on.
As I said before, not if you know your Dictionary will never grow. This is not app-specific.
If I use new: to optimize performance I am fairly certain that it won't grow afterwards. My original statement still stands.
You said "fairly" certain, now please consider the case where it's *certain* never to grow (perhaps row 187 on the spreadsheet, above). I gave you a common, concrete example in an earlier post. That's the case where your incomplete manpage comment (under Collections-ul.871) could easily trick the developer into hurting her apps performance when she thought she was helping it. You argued (with flawed logic) that people would have to "think about the sizes all the time" but that's exactly what this would introduce, except in a much more insidious way! Because it affects negatively depending on the *runtime parameter* value passed to #new:.
That is where the time is supposed to be saved. If new is faster than new:
1, I am fine with that.
You say you're "fine" with that *certain* performance hit, but you're not fine with this uncertain one whose worst-case possibility was measured to be an equal performance cost. *And*, you're even willing to trade away certainty of space-efficiency, too.
This thread is about reducing the defaultSize. The other thread is about
#new:1 needing to be same speed as #new. They're two separate discussions. An API design that forces developers to trade one optimization for another is bad design, plain and simple.
Then the only sensible solution would be to deprecate and remove new for preallocating collections altogether. Forcing everyone to always think ahead of their collection sizes.
Sigh. With statements like the above, I wonder whether you're actually interested in finding a solution or simply arguing... :(
For collections that are supposed to grow automatically. Slowing down developent of even non-performance-critical parts. I don't want that either.
This was your best argument, but it only takes about 10 seconds to understand its logical flaw -- if you're writing performance-critical code, you're going to write #new: anyway. If you're not, an extra grow from writing #new won't affect performance.
Changing the current behavior can make it worse for existing applications
(again: we lack the numbers to be sure of the contrary).
The worst-case scenario was benched and provided for you, and shown to be a similar effect as Collections-ul.871 on #new: 1.
Keeping the current behavior is bad with regards to your concerns.
As I said multiple times, the "current behavior" is *fine* to my concerns, Jakob, it was Levente's proposal which introduced them. This was just one of two different solutions proposed, but you handily rejected them both, even while offering zero alternatives or even acknowledgement that my concerns are valid. I'm going to cut my losses from this discussion now. Your arguments seem like you either aren't understanding mine, or aren't willing to look for consensus.
Levente has committed Collections-ul.872 with only his other changes, e.g., excluding the one which introduced my concern. He's a gentleman. It's sad that our inability to resolve this petty squabble kept back his other improvements, which would've included the ability to create minimally small Dictionary's with a minimum internal array size of 3 instead of 5. :(
- Chris
This conversation might well get a prize for lowest signal to noise ratio in the entire history of the Internet.
Indeed, but some sort of gentle bliss is to be found in the contemplation of an epic fight over such a small issue - at least this time the world as we know it is not at stake, and we can cosily relax while listening to the mildly terrifying wind sounds coming from the tempest in a teapot.
It might seem easy to suggest moving Collections-cmm.874 to the treated inbox - after all, everyone who reviewed it gave it a solid thumbs down - but that would spoil the fun.
Yes, keep it rolling - it may become its own art form.
Stef
squeak-dev@lists.squeakfoundation.org