[squeak-dev] The Inbox: Collections-ul.355.mcz

commits at source.squeak.org commits at source.squeak.org
Mon Apr 12 23:38:56 UTC 2010


A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-ul.355.mcz

==================== Summary ====================

Name: Collections-ul.355
Author: ul
Time: 13 April 2010, 12:12:46.102 am
UUID: 214765b9-8ae4-c045-9dbe-2d224a1aff4a
Ancestors: Collections-ar.354

- started to add documentation for HashedCollection
- fixed HashedCollection >> #add:withOccurrences:
- unified primes for identity-based and non-identitity-based HashedCollection table sizes
- added #compact protocol to HashedCollection hierarchy which is similar to the #rehash protocol
- fixed Symbol class >> #compactSymbolTable


=============== Diff against Collections-ar.354 ===============

Item was changed:
  Collection subclass: #HashedCollection
  	instanceVariableNames: 'tally array'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'Collections-Abstract'!
+ 
+ !HashedCollection commentStamp: 'ul 4/12/2010 22:37' prior: 0!
+ I am an abstract collection of objects that implement hash and equality in a consitent way. This means that whenever two objects are equal, their hashes have to be equal too. If two objects are equal then I can only store one of them. Hashes are expected to be integers (preferably SmallIntegers). I also expect that the objects contained by me do not change their hashes. If that happens, hash invariants have to be re-established, which can be done by #rehash.
+ 
+ Since I'm abstract, no instances of me should exist. My subclasses should implement #scanFor:, #fixCollisionsFrom: and #noCheckNoGrowFillFrom:.
+ 
+ Instance Variables
+ 	array:		<ArrayedCollection> (typically Array or WeakArray)
+ 	tally:		<Integer> (non-negative)
+ 
+ array
+ 	- An array whose size is a prime number, it's non-nil elements are the elements of the collection, and whose nil elements are empty slots. There is always at least one nil. In fact I try to keep my "load" at 75% or less so that hashing will work well.
+ 
+ tally
+ 	- The number of elements in the collection. The array size is always greater than this.
+ 
+ Implementation details:
+ I implement a hash table which uses open addressing with linear probing as the method of collision resolution. Searching for an element or a free slot for an element is done by #scanFor: which should return the index of the slot in array corresponding to it's argument. When an element is removed #fixCollisionsFrom: should rehash all elements in array between the original index of the removed element, wrapping around after the last slot until reaching an empty slot. My maximum load factor (75%) is hardcoded in #atNewIndex:put:, so it can only be changed by overriding that method. When my load factor reaches this limit I replace my array with a larger one (see #grow) ensuring that my load factor will be less than or equal to 50%. The new array is filled by #noCheckNoGrowFillFrom: which should use #scanForEmptySlotFor: instead of #scanFor: for better performance. I do not shrink.
+ !

Item was added:
+ ----- Method: HashedCollection class>>compactAll (in category 'initialize-release') -----
+ compactAll
+ 	"HashedCollection compactAll"	
+ 		
+ 	self allSubclassesDo: #compactAllInstances!

Item was added:
+ ----- Method: WeakSet>>compact (in category 'private') -----
+ compact
+ 	"Reduce the size of array so that the load factor will be ~75%."
+ 	
+ 	| newCapacity |
+ 	newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
+ 	self growTo: newCapacity!

Item was added:
+ ----- Method: HashedCollection class>>compactAllInstances (in category 'initialize-release') -----
+ compactAllInstances
+ 	"Do not use #allInstancesDo: because compact may create new instances."
+ 
+ 	self allInstances do: #compact!

Item was changed:
  ----- Method: HashedCollection>>add:withOccurrences: (in category 'adding') -----
  add: newObject withOccurrences: anInteger
+ 	"Add newObject anInteger times to the receiver. Do nothing if anInteger is less than one. Answer newObject."
+ 	
+ 	anInteger < 1 ifTrue: [ ^newObject ].
+ 	^self add: newObject "I can only store an object once."
+ 	!
- 	^ self add: newObject!

Item was added:
+ ----- Method: WeakKeyDictionary>>compact (in category 'private') -----
+ compact
+ 	"Reduce the size of array so that the load factor will be ~75%."
+ 	
+ 	| newCapacity |
+ 	newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
+ 	self growTo: newCapacity!

Item was changed:
  ----- Method: HashedCollection class>>goodPrimes (in category 'sizing') -----
  goodPrimes
  	"Answer a sorted array of prime numbers less than one billion that make good
  	hash table sizes. Should be expanded as needed.  See comments below code"
  	
+ 	^#(
+ 		5 11 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
- 	^#(5 11 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
  		2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
  		4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
+ 		8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
+ 		17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
+ 		35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
+ 		70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
+ 		140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
+ 		282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
+ 		562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
+ 		1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
+ 		2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
+ 		4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
+ 		9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
+ 		17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
+ 		35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
+ 		71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
+ 		143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
+ 		285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707 
+ 		570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411 
+ 		1073741833)
- 		8887 9587 10243 10937 11617 12289 12967 13649 14341 15013 15727
- 		17749 19121 20479 21859 23209 24593 25939 27329 28669 30047 31469
- 		35507 38231 40961 43711 46439 49157 51893 54617 57347 60077 62801
- 		70583 75619 80669 85703 90749 95783 100823 105871 110909 115963 120997 126031
- 		141157 151237 161323 171401 181499 191579 201653 211741 221813 231893 241979 252079
- 		282311 302483 322649 342803 362969 383143 403301 423457 443629 463787 483953 504121
- 		564617 604949 645313 685609 725939 766273 806609 846931 887261 927587 967919 1008239
- 		1123477 1198397 1273289 1348177 1423067 1497983 1572869 1647761 1722667 1797581 1872461 1947359 2022253
- 		2246953 2396759 2546543 2696363 2846161 2995973 3145739 3295541 3445357 3595117 3744941 3894707 4044503
- 		4493921 4793501 5093089 5392679 5692279 5991883 6291469 6591059 6890641 7190243 7489829 7789447 8089033
- 		8987807 9586981 10186177 10785371 11384539 11983729 12582917 13182109 13781291 14380469 14979667 15578861 16178053
- 		17895707 19014187 20132683 21251141 22369661 23488103 24606583 25725083 26843549 27962027 29080529 30198989 31317469 32435981
- 		35791397 38028379 40265327 42502283 44739259 46976221 49213237 51450131 53687099 55924061 58161041 60397993 62634959 64871921
- 		71582857 76056727 80530643 85004567 89478503 93952427 98426347 102900263 107374217 111848111 116322053 120795971 125269877 129743807
- 		143165587 152113427 161061283 170009141 178956983 187904819 196852693 205800547 214748383 223696237 232644089 241591943 250539763 259487603
- 		285212677 301989917 318767107 335544323 352321547 369098771 385876021 402653189 419430419 436207619 452984849 469762049 486539323 503316511 520093703
- 		570425377 603979799 637534277 671088667 704643083 738197549 771751961 805306457 838860817 872415239 905969671 939524129 973078537 1006632983 1040187403
- 		1073741789)
  
+ "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
+ 
+ "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers. This helps to avoid the collision of chains in identity-based hashed collections. To do that  they were chosen to return a low value when the following block is evaluated with them as argument:
+  [ :prime |
+ 	| slots cost optimalDistance previous |
+ 	slots := Array new: 4097.
+ 	0 to: 4095 do: [ :ea | slots at: ea + 1 put: ea *  262144 \\ prime ].
+ 	slots at: 4097 put: prime.
+ 	slots sort.
+ 	cost := 0.
+ 	optimalDistance := prime // 4096.
+ 	2 to: 4097 do: [ :index |
+ 		| newCost |
+ 		newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
+ 		newCost > cost ifTrue: [ cost := newCost ] ].
+ 	cost ]."!
- "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."!

Item was changed:
  ----- Method: Collection>>add:withOccurrences: (in category 'adding') -----
  add: newObject withOccurrences: anInteger
+ 	"Add newObject anInteger times to the receiver. Do nothing if anInteger is less than one. Answer newObject."
- 	"Add newObject anInteger times to the receiver. Answer newObject."
  
  	anInteger timesRepeat: [self add: newObject].
  	^ newObject!

Item was added:
+ ----- Method: HashedCollection>>compact (in category 'private') -----
+ compact
+ 	"Reduce the size of array so that the load factor will be ~75%."
+ 	
+ 	| newCapacity |
+ 	newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
+ 	self growTo: newCapacity!

Item was changed:
  ----- Method: Symbol class>>compactSymbolTable (in category 'class initialization') -----
  compactSymbolTable
+ 	"Reduce the size of the symbol table so that it holds all existing symbols with 25% free space."
- 	"Reduce the size of the symbol table so that it holds all existing symbols + 25% (changed from 1000 since sets like to have 25% free and the extra space would grow back in a hurry)"
  
  	| oldSize |
- 
  	Smalltalk garbageCollect.
+ 	oldSize := SymbolTable capacity.
+ 	SymbolTable compact.
+ 	^(oldSize - SymbolTable capacity) printString, ' slot(s) reclaimed'!
- 	oldSize := SymbolTable array size.
- 	SymbolTable growTo: SymbolTable size * 4 // 3 + 100.
- 	^oldSize printString,'  ',(oldSize - SymbolTable array size) printString, ' slot(s) reclaimed'!

Item was removed:
- ----- Method: WeakIdentityKeyDictionary class>>goodPrimes (in category 'sizing') -----
- goodPrimes
- 
- 	^self goodPrimesForIdentityBasedHashedCollections!

Item was removed:
- ----- Method: IdentitySet class>>goodPrimes (in category 'sizing') -----
- goodPrimes
- 
- 	^self goodPrimesForIdentityBasedHashedCollections!

Item was removed:
- ----- Method: IdentityDictionary class>>goodPrimes (in category 'sizing') -----
- goodPrimes
- 
- 	^self goodPrimesForIdentityBasedHashedCollections!

Item was removed:
- ----- Method: KeyedIdentitySet class>>goodPrimes (in category 'sizing') -----
- goodPrimes
- 
- 	^self goodPrimesForIdentityBasedHashedCollections!




More information about the Squeak-dev mailing list