[squeak-dev] The Inbox: Collections-cmm.870.mcz

Levente Uzonyi leves at caesar.elte.hu
Sat Jan 4 03:25:41 UTC 2020


Hi Chris,

The problem with that change is that #sizeFor: returns incorrect values 
when its argument is 4 or 5. The change breaks #compact, and the contract 
of #new: and #sizeFor:, because when you want to store 4 or 5 elements, 
you'll get a HashedCollection which can only store 3 elements, so there'll 
either be an extra grow involved or the collection will be overfilled.

Here's an example where the change makes #compact break the contract of 
HashedCollecions:

 	(1 to: 5) asSet compact; includes: 6 "#includes: will raise an error"

The default HashedCollection instances created with #new can hold 3 
elements without growing.
If you need a fast way to create a Dictionary that can hold 5 elements 
without growing, then you can add a new method and use that in your code:

HashedCollection class >> #newFor5

 	^self basicNew initialize: 7

This way the method guarantees the capacity is a good prime just like 
#new does. And it won't affect anything else.

Or we could change #new: to create a larger array by default, but then we 
would use more memory. In my image 68.7% of all the HashedCollections have 
the default capacity: 5. And 69% of all HashedCollections have fewer than 
3 elements, so those could all be shrunk down to capacity 3, saving 232kB 
memory (64-bit image).
If we changed the default capacity to 7, the size of the image would grow 
by 231kB, which is not much nowadays, but still something.

Perhaps we could create a class variable to hold the default capacity:

HashedCollection class >> #new

 	^self basicNew initialize: DefaultCapacity

And then you could change it to 7 or 11 in your images.


Levente

On Fri, 3 Jan 2020, Chris Muller wrote:

> Hi Levente,
> Due to ancestry, the diff of this over Collections-ul.867 is only this:
> 
>    +       nElements < 6 ifTrue: [ ^5 ].
> 
> The issue I'm facing is, I don't know the size of the Dictionary at compile time, but I do at runtime.
> 
>     Dictionary new: runtimeVar
> 
> However, I don't know what the runtimeVar values will be.  It could be run millions of times with a value less than the default size given by "Dictionary new", or greater.  The implementation is forcing me to optimize for one
> or the other at compile time, but I can't.  As developer, I'm "torn" whether to write:
> 
>        Dictionary new
>        Dictionary new: runtimeVar    (44% speed of Dictionary new when runtimeVar < 6)
> or    (runtimeVar < 6) ifTrue: [ Dictionary new ] ifFalse: [ Dictionary new: runtimeVar ]
> 
> That conflict is what this last tweak tries to solve.  It improves the second case, above, to 89% of "Dictionary new".
> 
> Everything about this looks good and would like to include it in 5.3.
> 
>  - Chris
> 
> On Fri, Jan 3, 2020 at 7:59 PM <commits at source.squeak.org> wrote:
>       Chris Muller uploaded a new version of Collections to project The Inbox:
>       http://source.squeak.org/inbox/Collections-cmm.870.mcz
>
>       ==================== Summary ====================
>
>       Name: Collections-cmm.870
>       Author: cmm
>       Time: 3 January 2020, 7:59:19.257837 pm
>       UUID: b3e2efab-cf35-426d-9910-adab44aa4ed4
>       Ancestors: Collections-fn.869, Collections-ul.867
>
>       Building on Collections-ul.867, let "Dictionary new: anySize" _never_ be significantly slower than "Dictionary new".
>
>       =============== Diff against Collections-fn.869 ===============
>
>       Item was changed:
>         ----- Method: HashedCollection class>>goodPrimeAtLeast: (in category 'sizing') -----
>         goodPrimeAtLeast: lowerLimit
>       +       "Answer the smallest good prime >= lowerlimit.
>       +       If lowerLimit is larger than the largest known good prime, just make it odd.
>       +       Use linear search, and exponential search to speed up cases when lowerLimit is small (<2500 and <100000, respectively)."
>       -       "Answer the next good prime >= lowerlimit.
>       -       If lowerLimit is larger than the largest known good prime,
>       -       just make it odd."
>
>       +       | high mid low prime primes |
>       -       | low mid high prime primes |
>               primes := self goodPrimes.
>       +       (primes at: primes size) < lowerLimit ifTrue: [
>       -       high := primes size.
>       -       lowerLimit > (primes at: high) ifTrue: [
>                       ^lowerLimit bitOr: 1 ].
>       +       lowerLimit < 2500 ifTrue: [
>       +               "Use linear search when the limit is small. The boundary is based on measurements."
>       +               high := 1.
>       +               [
>       +                       (prime := primes at: high) >= lowerLimit ifTrue: [ ^prime ].
>       +                       high := high + 1 ] repeat ].
>       +       lowerLimit < 100000
>       +               ifTrue: [
>       +                       "Use exponential search when the limit is not too large. The boundary is based on measurements."
>       +                       high := 1.
>       +                       [ (primes at: high) < lowerLimit ] whileTrue: [
>       +                               high := high * 2 ].
>       +                       low := high // 2 + 1. "high // 2 was smaller than lowerLimit" ]
>       +               ifFalse: [
>       +                       "Regular binary search."
>       +                       low := 1.
>       +                       high := primes size ].
>       -       low := 1.
>               [ high - low <= 1 ] whileFalse: [
>                       mid := high + low // 2.
>                       prime := primes at: mid.
>                       lowerLimit < prime
>                               ifTrue: [ high := mid ]
>                               ifFalse: [
>                                       lowerLimit > prime
>                                               ifTrue: [ low := mid ]
>                                               ifFalse: [ ^prime ] ] ].
>               (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>               ^primes at: high!
>
>       Item was changed:
>         ----- Method: HashedCollection class>>sizeFor: (in category 'sizing') -----
>         sizeFor: nElements
>               "Large enough prime (or odd if too large) size to hold nElements with some slop (see fullCheck)"
>
>       +       nElements < 6 ifTrue: [ ^5 ].
>       -       nElements < 4 ifTrue: [ ^5 ].
>               ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!
>
>       Item was changed:
>         ----- Method: HashedCollection>>compact (in category 'private') -----
>         compact
>               "Reduce the size of array so that the load factor will be ~75%."
>
>               | newCapacity |
>       +       newCapacity := self class sizeFor: tally.
>       -       newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
>               self growTo: newCapacity!
> 
> 
> 
>


More information about the Squeak-dev mailing list