[squeak-dev] The Inbox: Collections-ul.873.mcz

Chris Muller asqueaker at gmail.com
Wed Feb 5 23:24:32 UTC 2020


Thanks Levente.  I really appreciate it.

 - Chris

On Wed, Feb 5, 2020 at 9:47 AM <commits at source.squeak.org> wrote:

> Levente Uzonyi uploaded a new version of Collections to project The Inbox:
> http://source.squeak.org/inbox/Collections-ul.873.mcz
>
> ==================== Summary ====================
>
> Name: Collections-ul.873
> Author: ul
> Time: 5 February 2020, 4:29:23.449183 pm
> UUID: fc9428f9-07fc-43fb-90b0-dc7368b64cec
> Ancestors: Collections-ul.872
>
> HashedCollection changes:
> - speed up #goodPrimeAtLeast: when its argument is small
> - let #sizeFor: return the smallest possible capacity (3)
> - use #sizeFor: instead of #goodPrimeAtLeast: in #compact. This fixes the
> case when a 4 element dictionary is compacted to 80% capacity (array size =
> 5), which is higher than the advertised maximum: 75%.
> - fixed the formula in #sizeFor:, which made the shortcut for small values
> unnecessary. Removed the shortcut. The fix often yields smaller
> dictionaries.
> - store good primes in a class variable GoodPrimes for faster access and
> possible customization
> - use blocks instead of symbols in #compactAll and #rehashAll
> - introduced #slowSize in HashedCollection, which makes it possible to not
> override #growSize and #compact in weak subclasses
>
> =============== Diff against Collections-ul.872 ===============
>
> Item was changed:
>   Collection subclass: #HashedCollection
>         instanceVariableNames: 'tally array'
> +       classVariableNames: 'GoodPrimes'
> -       classVariableNames: ''
>         poolDictionaries: ''
>         category: 'Collections-Abstract'!
>
>   !HashedCollection commentStamp: 'ul 4/12/2010 22:37' prior: 0!
>   I am an abstract collection of objects that implement hash and equality
> in a consitent way. This means that whenever two objects are equal, their
> hashes have to be equal too. If two objects are equal then I can only store
> one of them. Hashes are expected to be integers (preferably SmallIntegers).
> I also expect that the objects contained by me do not change their hashes.
> If that happens, hash invariants have to be re-established, which can be
> done by #rehash.
>
>   Since I'm abstract, no instances of me should exist. My subclasses
> should implement #scanFor:, #fixCollisionsFrom: and #noCheckNoGrowFillFrom:.
>
>   Instance Variables
>         array:          <ArrayedCollection> (typically Array or WeakArray)
>         tally:          <Integer> (non-negative)
>
>   array
>         - An array whose size is a prime number, it's non-nil elements are
> the elements of the collection, and whose nil elements are empty slots.
> There is always at least one nil. In fact I try to keep my "load" at 75% or
> less so that hashing will work well.
>
>   tally
>         - The number of elements in the collection. The array size is
> always greater than this.
>
>   Implementation details:
>   I implement a hash table which uses open addressing with linear probing
> as the method of collision resolution. Searching for an element or a free
> slot for an element is done by #scanFor: which should return the index of
> the slot in array corresponding to it's argument. When an element is
> removed #fixCollisionsFrom: should rehash all elements in array between the
> original index of the removed element, wrapping around after the last slot
> until reaching an empty slot. My maximum load factor (75%) is hardcoded in
> #atNewIndex:put:, so it can only be changed by overriding that method. When
> my load factor reaches this limit I replace my array with a larger one (see
> #grow) ensuring that my load factor will be less than or equal to 50%. The
> new array is filled by #noCheckNoGrowFillFrom: which should use
> #scanForEmptySlotFor: instead of #scanFor: for better performance. I do not
> shrink.
>   !
>
> Item was changed:
>   ----- Method: HashedCollection class>>compactAll (in category
> 'initialize-release') -----
>   compactAll
>         "HashedCollection compactAll"
>
> +       self allSubclassesDo: [ :each | each compactAllInstances ]!
> -       self allSubclassesDo: #compactAllInstances!
>
> Item was changed:
>   ----- Method: HashedCollection class>>goodPrimeAtLeast: (in category
> 'sizing') -----
>   goodPrimeAtLeast: lowerLimit
> +       "Answer the smallest good prime >= lowerlimit.
> +       If lowerLimit is larger than the largest known good prime, just
> make it odd.
> +       Use linear search, and exponential search to speed up cases when
> lowerLimit is small (<2500 and <100000, respectively).
> +       Assume that there are goodPrimes greater than 100000."
> -       "Answer the next good prime >= lowerlimit.
> -       If lowerLimit is larger than the largest known good prime,
> -       just make it odd."
>
> +       | highIndex midIndex lowIndex prime |
> +       lowerLimit <= 7 ifTrue: [
> +               lowerLimit <= 3 ifTrue: [ ^3 ].
> +               lowerLimit <= 5 ifTrue: [ ^5 ].
> +               ^7 ].
> +       GoodPrimes ifNil: [ "migration only"
> +               self initializeGoodPrimes ].
> +       lowerLimit < 2500 ifTrue: [
> +               "Use linear search when the limit is small. The boundary
> is based on measurements."
> +               highIndex := 4. "skip 3 5 and 7"
> +               [ (GoodPrimes at: highIndex) < lowerLimit ] whileTrue: [
> +                       highIndex := highIndex + 1 ].
> +               ^GoodPrimes at: highIndex ].
> +       lowerLimit < 100000
> +               ifTrue: [
> +                       "Use exponential search when the limit is not too
> large. The boundary is based on measurements."
> +                       highIndex := 1.
> +                       [ (GoodPrimes at: highIndex) < lowerLimit ]
> whileTrue: [
> +                               highIndex := highIndex * 2 ].
> +                       lowIndex := highIndex // 2 + 1. "highIndex // 2
> was smaller than lowerLimit" ]
> +               ifFalse: [
> +                       "Regular binary search."
> +                       lowIndex := 1.
> +                       highIndex := GoodPrimes size.
> +                       "Check whether the largest prime would fit"
> +                       (GoodPrimes at: highIndex) < lowerLimit ifTrue: [
> +                               ^lowerLimit bitOr: 1 ]. ].
> +       [ highIndex - lowIndex <= 1 ] whileFalse: [
> +               midIndex := highIndex + lowIndex // 2.
> +               prime := GoodPrimes at: midIndex.
> -       | low mid high prime primes |
> -       primes := self goodPrimes.
> -       high := primes size.
> -       lowerLimit > (primes at: high) ifTrue: [
> -               ^lowerLimit bitOr: 1 ].
> -       low := 1.
> -       [ high - low <= 1 ] whileFalse: [
> -               mid := high + low // 2.
> -               prime := primes at: mid.
>                 lowerLimit < prime
> +                       ifTrue: [ highIndex := midIndex ]
> -                       ifTrue: [ high := mid ]
>                         ifFalse: [
>                                 lowerLimit > prime
> +                                       ifTrue: [ lowIndex := midIndex ]
> -                                       ifTrue: [ low := mid ]
>                                         ifFalse: [ ^prime ] ] ].
> +       (GoodPrimes at: lowIndex) >= lowerLimit ifTrue: [ ^GoodPrimes at:
> lowIndex ].
> +       ^GoodPrimes at: highIndex!
> -       (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
> -       ^primes at: high!
>
> Item was changed:
>   ----- Method: HashedCollection class>>goodPrimes (in category 'sizing')
> -----
>   goodPrimes
> +       "Answer a sorted array of prime numbers less than one billion that
> make good hash table sizes. See #initializeGoodPrimes."
> -       "Answer a sorted array of prime numbers less than one billion that
> make good
> -       hash table sizes. Should be expanded as needed.  See comments
> below code"
>
> +       ^GoodPrimes ifNil: [
> +               self initializeGoodPrimes.
> +               GoodPrimes ]!
> -       ^#(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857
> 1151 1549 2069
> -               2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
> -               4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
> -               8783 9601 10243 10867 11549 12239 12919 13679 14293 15013
> 15731
> -               17569 19051 20443 21767 23159 24611 25847 27397 28571
> 30047 31397
> -               35771 38201 40841 43973 46633 48989 51631 54371 57349
> 60139 62969
> -               70589 76091 80347 85843 90697 95791 101051 106261 111143
> 115777 120691 126311
> -               140863 150523 160969 170557 181243 190717 201653 211891
> 221251 232591 242873 251443
> -               282089 300869 321949 341227 362353 383681 401411 422927
> 443231 464951 482033 504011
> -               562621 605779 647659 681607 723623 763307 808261 844709
> 886163 926623 967229 1014617
> -               1121987 1201469 1268789 1345651 1429531 1492177 1577839
> 1651547 1722601 1800377 1878623 1942141 2028401
> -               2242727 2399581 2559173 2686813 2836357 3005579 3144971
> 3283993 3460133 3582923 3757093 3903769 4061261
> -               4455361 4783837 5068529 5418079 5680243 6000023 6292981
> 6611497 6884641 7211599 7514189 7798313 8077189
> -               9031853 9612721 10226107 10745291 11338417 11939203
> 12567671 13212697 13816333 14337529 14938571 15595673 16147291
> -               17851577 18993941 20180239 21228533 22375079 23450491
> 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
> -               35817611 37983761 40234253 42457253 44750177 46957969
> 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
> -               71582779 76039231 80534381 84995153 89500331 93956777
> 98470819 102879613 107400389 111856841 116365721 120819287 125246581
> 129732203
> -               143163379 152076289 161031319 169981667 179000669
> 187913573 196826447 205826729 214748357 223713691 232679021 241591901
> 250504801 259470131
> -               285162679 301939921 318717121 335494331 352271573
> 369148753 385926017 402603193 419480419 436157621 453034849 469712051
> 486589307 503366497 520043707
> -               570475349 603929813 637584271 671138659 704693081
> 738247541 771801929 805356457 838910803 872365267 905919671 939574117
> 973128521 1006682977 1040137411
> -               1073741833)
> -
> - "The above primes past 2069 were chosen carefully so that they do not
> interact badly with 1664525 (used by hashMultiply), and so that gcd(p,
> (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
> -
> - "The above primes also try to map the values of ((0 to: 4095) collect: [
> :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers.
> This helps to avoid the collision of chains in identity-based hashed
> collections. To do that  they were chosen to return a low value when the
> following block is evaluated with them as argument:
> -  [ :prime |
> -       | n slots cost optimalDistance |
> -       n := 1 bitShift: 22.
> -       slots := Array new: n + 1.
> -       0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\
> prime ].
> -       slots at: n + 1 put: prime.
> -       slots sort.
> -       cost := 0.
> -       optimalDistance := prime // n.
> -       2 to: n + 1 do: [ :index |
> -               | newCost |
> -               newCost := optimalDistance - ((slots at: index) - (slots
> at: index - 1)).
> -               newCost > cost ifTrue: [ cost := newCost ] ].
> -       result add: prime -> cost ]
> -
> - The shifts in the block relate to the numer of bits the #identityHash
> consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!
>
> Item was added:
> + ----- Method: HashedCollection class>>initialize (in category 'sizing')
> -----
> + initialize
> +
> +       self initializeGoodPrimes!
>
> Item was added:
> + ----- Method: HashedCollection class>>initializeGoodPrimes (in category
> 'sizing') -----
> + initializeGoodPrimes
> +       "GoodPrimes is a sorted array of prime numbers less than one
> billion that make good hash table sizes. Should be expanded as needed. See
> comments below code."
> +
> +       GoodPrimes := #(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359
> 479 641 857 1151 1549 2069
> +               2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
> +               4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
> +               8783 9601 10243 10867 11549 12239 12919 13679 14293 15013
> 15731
> +               17569 19051 20443 21767 23159 24611 25847 27397 28571
> 30047 31397
> +               35771 38201 40841 43973 46633 48989 51631 54371 57349
> 60139 62969
> +               70589 76091 80347 85843 90697 95791 101051 106261 111143
> 115777 120691 126311
> +               140863 150523 160969 170557 181243 190717 201653 211891
> 221251 232591 242873 251443
> +               282089 300869 321949 341227 362353 383681 401411 422927
> 443231 464951 482033 504011
> +               562621 605779 647659 681607 723623 763307 808261 844709
> 886163 926623 967229 1014617
> +               1121987 1201469 1268789 1345651 1429531 1492177 1577839
> 1651547 1722601 1800377 1878623 1942141 2028401
> +               2242727 2399581 2559173 2686813 2836357 3005579 3144971
> 3283993 3460133 3582923 3757093 3903769 4061261
> +               4455361 4783837 5068529 5418079 5680243 6000023 6292981
> 6611497 6884641 7211599 7514189 7798313 8077189
> +               9031853 9612721 10226107 10745291 11338417 11939203
> 12567671 13212697 13816333 14337529 14938571 15595673 16147291
> +               17851577 18993941 20180239 21228533 22375079 23450491
> 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
> +               35817611 37983761 40234253 42457253 44750177 46957969
> 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
> +               71582779 76039231 80534381 84995153 89500331 93956777
> 98470819 102879613 107400389 111856841 116365721 120819287 125246581
> 129732203
> +               143163379 152076289 161031319 169981667 179000669
> 187913573 196826447 205826729 214748357 223713691 232679021 241591901
> 250504801 259470131
> +               285162679 301939921 318717121 335494331 352271573
> 369148753 385926017 402603193 419480419 436157621 453034849 469712051
> 486589307 503366497 520043707
> +               570475349 603929813 637584271 671138659 704693081
> 738247541 771801929 805356457 838910803 872365267 905919671 939574117
> 973128521 1006682977 1040137411
> +               1073741833)
> +
> + "The above primes past 2069 were chosen carefully so that they do not
> interact badly with 1664525 (used by hashMultiply), and so that gcd(p,
> (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
> +
> + "The above primes also try to map the values of ((0 to: 4095) collect: [
> :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers.
> This helps to avoid the collision of chains in identity-based hashed
> collections. To do that  they were chosen to return a low value when the
> following block is evaluated with them as argument:
> +  [ :prime |
> +       | n slots cost optimalDistance |
> +       n := 1 bitShift: 22.
> +       slots := Array new: n + 1.
> +       0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\
> prime ].
> +       slots at: n + 1 put: prime.
> +       slots sort.
> +       cost := 0.
> +       optimalDistance := prime // n.
> +       2 to: n + 1 do: [ :index |
> +               | newCost |
> +               newCost := optimalDistance - ((slots at: index) - (slots
> at: index - 1)).
> +               newCost > cost ifTrue: [ cost := newCost ] ].
> +       result add: prime -> cost ]
> +
> + The shifts in the block relate to the numer of bits the #identityHash
> consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!
>
> Item was changed:
>   ----- Method: HashedCollection class>>new (in category 'instance
> creation') -----
>   new
> +       "Create a HashedCollection large enough to hold 3 different
> objects without growing."
> +
> +       ^self basicNew initialize: 5 "For performance, inline the value 5
> which would normally be returned by #sizeFor:."!
> -       ^ self basicNew initialize: 5!
>
> Item was changed:
>   ----- Method: HashedCollection class>>new: (in category 'instance
> creation') -----
> + new: numberOfElements
> +       "Create a HashedCollection large enough to hold numberOfElements
> different objects without growing."
> +
> +       ^self basicNew initialize: (numberOfElements <= 3
> +               ifFalse: [ self sizeFor: numberOfElements ]
> +               ifTrue: [ "Inline values returned by #sizeFor: to ensure
> that #new: is not significantly slower than #new for small values."
> +                       numberOfElements < 3
> +                               ifTrue: [ 3 ]
> +                               ifFalse: [ 5 ] ])!
> - new: nElements
> -       "Create a Set large enough to hold nElements without growing"
> -       ^ self basicNew initialize: (self sizeFor: nElements)!
>
> Item was changed:
>   ----- Method: HashedCollection class>>rehashAll (in category
> 'initialize-release') -----
>   rehashAll
>         "HashedCollection rehashAll"
>
> +       self allSubclassesDo: [ :each | each rehashAllInstances ]!
> -       self allSubclassesDo: #rehashAllInstances!
>
> Item was changed:
>   ----- Method: HashedCollection class>>sizeFor: (in category 'sizing')
> -----
> + sizeFor: numberOfElements
> +       "Return a large enough prime (or odd if too large), the size of
> the internal array to hold numberOfElements with at most 75% load factor."
> - sizeFor: nElements
> -       "Large enough prime (or odd if too large) size to hold nElements
> with some slop (see fullCheck)"
>
> +       ^self goodPrimeAtLeast: numberOfElements * 4 + 2 // 3 "Optimized
> version of (numberOfElements * 4 / 3) ceiling."!
> -       nElements < 4 ifTrue: [ ^5 ].
> -       ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!
>
> Item was changed:
>   ----- Method: HashedCollection>>compact (in category 'private') -----
>   compact
>         "Reduce the size of array so that the load factor will be ~75%."
>
>         | newCapacity |
> +       newCapacity := self class sizeFor: self slowSize.
> -       newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
>         self growTo: newCapacity!
>
> Item was changed:
>   ----- Method: HashedCollection>>growSize (in category 'private') -----
>   growSize
>         "Answer what my next higher table size should be"
>
> +       ^self class sizeFor: self slowSize * 2!
> -       ^self class goodPrimeAtLeast: array size * 3 // 2 + 2!
>
> Item was added:
> + ----- Method: HashedCollection>>slowSize (in category 'accessing') -----
> + slowSize
> +       "Answer an upper bound of the number of elements in this
> collection. For regular collections, this can simply be the value of tally,
> but for collections that cannot maintain an exact value, like current weak
> collections, this has to be calculated on the fly."
> +
> +       ^tally!
>
> Item was removed:
> - ----- Method: WeakIdentityDictionary>>compact (in category 'private')
> -----
> - compact
> -       "Reduce the size of array so that the load factor will be ~75%."
> -
> -       | newCapacity |
> -       tally := self slowSize.
> -       newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
> -       self growTo: newCapacity!
>
> Item was removed:
> - ----- Method: WeakKeyDictionary>>compact (in category 'private') -----
> - compact
> -       "Reduce the size of array so that the load factor will be ~75%."
> -
> -       | newCapacity |
> -       newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
> -       self growTo: newCapacity!
>
> Item was removed:
> - ----- Method: WeakKeyDictionary>>growSize (in category 'private') -----
> - growSize
> -       "Answer what my next table size should be.
> -       Note that, it can be less than the current."
> -
> -       ^self class goodPrimeAtLeast: self slowSize * 2 + 2!
>
> Item was removed:
> - ----- Method: WeakSet>>compact (in category 'private') -----
> - compact
> -       "Reduce the size of array so that the load factor will be ~75%."
> -
> -       | newCapacity |
> -       newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
> -       self growTo: newCapacity!
>
> Item was removed:
> - ----- Method: WeakSet>>growSize (in category 'private') -----
> - growSize
> -       "Answer what my next table size should be.
> -       Note that, it can be less than the current."
> -
> -       ^self class goodPrimeAtLeast: self slowSize * 2 + 2!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200205/50de8ec2/attachment.html>


More information about the Squeak-dev mailing list