[squeak-dev] The Trunk: Collections-ul.1000.mcz

commits at source.squeak.org commits at source.squeak.org
Tue Mar 29 10:00:53 UTC 2022


Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.1000.mcz

==================== Summary ====================

Name: Collections-ul.1000
Author: ul
Time: 28 March 2022, 8:25:48.093286 pm
UUID: 37ba296b-06e0-48f5-bdc8-2101110f360e
Ancestors: Collections-ct.998

Symbol changes:
- comment the symbol table implemented in a non-object-oriented fashion in the class comment
- both WeakSets of the symbol table are immutable (#beReadOnlyObject)
- #condenseNewSymbols does nothing if the tables are already empty and compact
- added a comment to all methods accessing the symbol table class variables
- use a loop instead of recursion in #rehash and #condenseNewSymbols

General:
- methods of the HashedCollection hierachy that update the tally and modify the array variable will update the tally first, so that immutable hashed collections raise an error before their non-immutable array variable is updated
- use #lookup: instead of #hasInterned:ifTrue:
- speed up WeakSet >> #postCopy
- added #isCompact to HashedCollection and Heap (thanks Christoph)
- avoid compaction of Heap if it is already compact
- updated  the comment of various methods

=============== Diff against Collections-ct.998 ===============

Item was changed:
  ----- Method: Dictionary>>removeKey:ifAbsent: (in category 'removing') -----
  removeKey: key ifAbsent: aBlock 
  	"Remove key (and its associated value) from the receiver. If key is not in 
  	the receiver, answer the result of evaluating aBlock. Otherwise, answer 
  	the value externally named by key."
  
  	| index association |
  	index := self scanFor: key.
  	association := (array at: index) ifNil: [ ^aBlock value ].
+ 	tally := tally - 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: nil.
- 	tally := tally - 1.
  	self fixCollisionsFrom: index.
  	^association value!

Item was changed:
  ----- Method: HashedCollection>>atNewIndex:put: (in category 'private') -----
  atNewIndex: index put: anObject
  
+ 	tally := tally + 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: anObject.
- 	tally := tally + 1.
  	"Keep array at least 1/4 free for decent hash behavior"
  	array size * 3 < (tally * 4) ifTrue: [ self grow ]!

Item was changed:
  ----- Method: HashedCollection>>growTo: (in category 'private') -----
  growTo: anInteger
+ 	"Reallocate the elements array with the given size and reinsert the old elements. Do it even if the size of the array is the same as the argument because this method is also used to rehash the collection."
+ 
- 	"Grow the elements array and reinsert the old elements. Do it even if the size of the array is the same as the argument because this methods is also used to rehash the collection."
- 	
  	| oldElements |
  	oldElements := array.
  	array := self arrayType new: anInteger.
  	self noCheckNoGrowFillFrom: oldElements!

Item was added:
+ ----- Method: HashedCollection>>isCompact (in category 'testing') -----
+ isCompact
+ 	"Answer true if I have the smallest possible capacity to store the elements."
+ 	
+ 	^array size = (self class sizeFor: self slowSize)!

Item was changed:
  ----- Method: Heap>>compact (in category 'growing') -----
  compact
  	"Remove any empty slots in the receiver."
  
+ 	self isCompact ifTrue: [ ^self ].
  	self growTo: self size.!

Item was added:
+ ----- Method: Heap>>isCompact (in category 'growing') -----
+ isCompact
+ 	"Answer true if I have the smallest possible capacity to store the elements."
+ 
+ 	^array size = tally!

Item was changed:
  ----- Method: KeyedSet>>remove:ifAbsent: (in category 'removing') -----
  remove: oldObject ifAbsent: aBlock
  
  	| index |
  	index := self scanFor: (keyBlock value: oldObject).
  	(array at: index) ifNil: [ ^ aBlock value ].
+ 	tally := tally - 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: nil.
- 	tally := tally - 1.
  	self fixCollisionsFrom: index.
  	^ oldObject!

Item was changed:
  ----- Method: KeyedSet>>removeKey:ifAbsent: (in category 'removing') -----
  removeKey: key ifAbsent: aBlock
  
  	| index obj |
  	index := self scanFor: key.
  	obj := (array at: index) ifNil: [ ^ aBlock value ].
+ 	tally := tally - 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: nil.
- 	tally := tally - 1.
  	self fixCollisionsFrom: index.
  	^ obj enclosedSetElement!

Item was changed:
  ----- Method: Set>>remove:ifAbsent: (in category 'removing') -----
  remove: oldObject ifAbsent: aBlock
  
  	| index |
  	index := self scanFor: oldObject.
  	(array at: index) ifNil: [ ^ aBlock value ].
+ 	tally := tally - 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: nil.
- 	tally := tally - 1.
  	self fixCollisionsFrom: index.
  	^ oldObject!

Item was changed:
  String subclass: #Symbol
  	instanceVariableNames: ''
  	classVariableNames: 'NewSymbols SymbolTable'
  	poolDictionaries: ''
  	category: 'Collections-Strings'!
  
+ !Symbol commentStamp: 'ul 3/28/2022 19:53' prior: 0!
- !Symbol commentStamp: 'mt 4/13/2021 17:04' prior: 0!
  I represent Strings that are created uniquely. Thus, someString asSymbol == someString asSymbol.
  
+ On my class-side, there is an implementation of a symbol table which provides concurrent access (both read and write) without using locks (Semaphore, Mutex, Monitor).
+ The state of the symbol table is stored in two immutable (see #beReadOnlyObject) WeakSets stored by the class variables SymbolTable and NewSymbols.
+ SymbolTable holds most of the interned symbols, while new symbols are always added to NewSymbols.
+ Once the size of NewSymbols exceeds a limit (1000 currently, see #intern:), its content is merged into SymbolTable (see #condenseNewSymbols).
+ 
+ To ensure a consistent view of the symbol table, all methods accessing it start with atomically creating a "snapshot" of the state, the two variables, by assigning them to two temporaries. Except for #intern:, which only accesses NewSymbols, hence it only creates a snapshot of that.
+ If the symbol table changes, NewSymbols will always be a different object, so it is enough to check whether NewSymbols is the same as before the operation to verify that the symbol table has not been modified.
+ 
+ There are three methods that can update the symbol table: #condenseNewSymbols, #rehash and #intern:. They create a snapshot first as described above, then create copies of the updated parts, and finally check whether NewSymbols is the same as before, and if it is, they apply their changes. That all happens atomically because #==, #ifTrue: and assignments are executed by the VM without suspension points, hence atomically. If NewSymbols is different, the methods are repeated until they succeed.
+ !
- ATTENTION!! To ensure consistency and thread safety without using a mutex, the two WeakSets which make up the symbol table are treated as if they were immutable. Removing from them without creating a copy just breaks that contract.!

Item was changed:
  ----- Method: Symbol class>>allSymbolTablesDo: (in category 'class initialization') -----
  allSymbolTablesDo: aBlock
+ 	"See the class comment for details about the usage of the class variables before changing this method"
+ 
- 	
  	| originalNewSymbols originalSymbolTable |
  	originalNewSymbols := NewSymbols.
  	originalSymbolTable := SymbolTable.
  	originalNewSymbols do: aBlock.
  	originalSymbolTable do: aBlock.!

Item was changed:
  ----- Method: Symbol class>>allSymbolTablesDo:after: (in category 'class initialization') -----
  allSymbolTablesDo: aBlock after: aSymbol
+ 	"See the class comment for details about the usage of the class variables before changing this method"
  
  	| originalNewSymbols originalSymbolTable |
  	originalNewSymbols := NewSymbols.
  	originalSymbolTable := SymbolTable.
  	(originalNewSymbols includes: aSymbol) 
  		ifTrue: [
  			originalNewSymbols do: aBlock after: aSymbol.
  			originalSymbolTable do: aBlock after: aSymbol ]
  		ifFalse: [
  			originalSymbolTable do: aBlock after: aSymbol.
  			originalNewSymbols do: aBlock after: aSymbol ]
  	!

Item was changed:
  ----- Method: Symbol class>>allSymbols (in category 'accessing') -----
  allSymbols
  	"Answer all interned symbols"
+ 	"See the class comment for details about the usage of the class variables before changing this method"
+ 
- 	
  	| originalNewSymbols originalSymbolTable |
  	originalNewSymbols := NewSymbols.
  	originalSymbolTable := SymbolTable.
  	^Array
  		new: originalNewSymbols slowSize + originalSymbolTable slowSize
+ 		streamContents: [ :stream |
- 		streamContents:[ :stream |
  			stream
  				nextPutAll: originalNewSymbols;
  				nextPutAll: originalSymbolTable ]
  !

Item was changed:
  ----- Method: Symbol class>>condenseNewSymbols (in category 'private') -----
  condenseNewSymbols
+ 	"Move all symbols from NewSymbols to SymbolTable, and compact SymbolTable if needed."
+ 	"See the class comment for details about the usage of the class variables before changing this method."
- 	"Move all symbols from NewSymbols to SymbolTable, and compact SymbolTable."
  
  	| originalNewSymbols originalSymbolTable newNewSymbols newSymbolTable |
+ 	[
+ 		originalNewSymbols := NewSymbols.
+ 		originalSymbolTable := SymbolTable.
+ 		(originalNewSymbols isEmpty and: [ originalSymbolTable isCompact ]) ifTrue: [
+ 			"Only recreate the sets if necessary"	
+ 			^self ].
+ 		(newNewSymbols := WeakSet new)
+ 			beReadOnlyObject.
+ 		(newSymbolTable := WeakSet new: originalNewSymbols slowSize + originalSymbolTable slowSize)
+ 			addAll: originalSymbolTable;
+ 			addAll: originalNewSymbols;
+ 			beReadOnlyObject.
+ 		originalNewSymbols == NewSymbols ifTrue: [
+ 			NewSymbols := newNewSymbols.
+ 			SymbolTable := newSymbolTable.
+ 			^self ].
+ 		"Some other process has modified the symbol table. Try again." ] repeat!
- 	originalNewSymbols := NewSymbols.
- 	originalSymbolTable := SymbolTable.
- 	newNewSymbols := WeakSet new.
- 	newSymbolTable := originalSymbolTable copy
- 		addAll: originalNewSymbols;
- 		compact;
- 		yourself.
- 	originalNewSymbols == NewSymbols ifFalse: [
- 		"Some other process has modified the symbols. Try again."
- 		^self condenseNewSymbols ].
- 	NewSymbols := newNewSymbols.
- 	SymbolTable := newSymbolTable!

Item was changed:
  ----- Method: Symbol class>>intern: (in category 'instance creation') -----
  intern: aStringOrSymbol 
  	"Answer the unique Symbol formed with given String.
  	If it does not exist yet, create it and intern it in the NewSymbols.
  	Interning a Symbol should return the Symbol itself, no Symbol should be duplicated"
+ 	"See the class comment for details about the usage of the class variables before changing this method"
  
  	| originalNewSymbols |
  	originalNewSymbols := NewSymbols.
  	^(self lookup: aStringOrSymbol) ifNil:[
  		| aSymbol newNewSymbols |
  		aStringOrSymbol isSymbol ifTrue:[
  			aSymbol := aStringOrSymbol.
  		] ifFalse:[
  			aSymbol := (aStringOrSymbol isOctetString ifTrue:[ByteSymbol] ifFalse:[WideSymbol])
  							new: aStringOrSymbol size.
  			aSymbol
  				copyFrom: aStringOrSymbol;
  				beReadOnlyObject.
  		].
  		newNewSymbols := originalNewSymbols copyWith: aSymbol.
+ 		newNewSymbols beReadOnlyObject.
  		originalNewSymbols == NewSymbols
  			ifTrue: [
  				NewSymbols := newNewSymbols.
  				newNewSymbols size > 1000 ifTrue: [ self condenseNewSymbols ].
  				aSymbol ]
  			ifFalse: [
  				"Some other process has modified the symbols. Try again."
  				self intern: aStringOrSymbol ] ]!

Item was changed:
  ----- Method: Symbol class>>lookup: (in category 'instance creation') -----
  lookup: aStringOrSymbol
  	"Answer the unique Symbol formed with given String, if it exists.
  	Answer nil if no such Symbol does exist yet.
  	Looking up a Symbol should return the Symbol itself
  	- no Symbol should be duplicated
  	- every Symbol should be registered in one of the two Symbol tables"
+ 	"See the class comment for details about the usage of the class variables before changing this method"
  
  	| originalNewSymbols originalSymbolTable |
  	originalNewSymbols := NewSymbols.
  	originalSymbolTable := SymbolTable.
+ 	^(originalSymbolTable like: aStringOrSymbol) ifNil: [ "Most symbols are in originalSymbolTable, so look for existing symbols in there first"
- 	^(originalSymbolTable like: aStringOrSymbol) ifNil: [
  		 originalNewSymbols like: aStringOrSymbol ]!

Item was changed:
  ----- Method: Symbol class>>possibleSelectorsFor: (in category 'private') -----
  possibleSelectorsFor: misspelled 
  	"Answer an ordered collection of possible corrections
  	for the misspelled selector in order of likelyhood"
  
  	| numArgs candidates lookupString best binary short long first |
  	lookupString := misspelled asLowercase. "correct uppercase selectors to lowercase"
  	numArgs := lookupString numArgs.
  	(numArgs < 0 or: [lookupString size < 2]) ifTrue: [^ OrderedCollection new: 0].
  	first := lookupString first.
  	short := lookupString size - (lookupString size // 4 max: 3) max: 2.
  	long := lookupString size + (lookupString size // 4 max: 3).
  
  	"First assemble candidates for detailed scoring"
  	candidates := OrderedCollection new.
  	self allSymbolTablesDo: [:s | | ss |
  		(((ss := s size) >= short	"not too short"
  			and: [ss <= long			"not too long"
  					or: [(s at: 1) = first]])	"well, any length OK if starts w/same letter"
  			and: [s numArgs = numArgs])	"and numArgs is the same"
  			ifTrue: [candidates add: s]].
  
  	"Then further prune these by correctAgainst:"
  	best := lookupString correctAgainst: candidates.
  	((misspelled last ~~ $:) and: [misspelled size > 1]) ifTrue: [
  		binary := misspelled, ':'.		"try for missing colon"
+ 		(self lookup: binary) ifNotNil: [:him | best addFirst: him]].
- 		Symbol hasInterned: binary ifTrue: [:him | best addFirst: him]].
  	^ best!

Item was changed:
  ----- Method: Symbol class>>rehash (in category 'private') -----
  rehash
  	"Rebuild the hash table, reclaiming unreferenced Symbols. This method will intern all symbols. You're probably looking for #condenseNewSymbols instead."
+ 	"See the class comment for details about the usage of the class variables before changing this method"
  
  	| originalNewSymbols originalSymbolTable newNewSymbols newSymbolTable |
+ 	[
+ 		originalNewSymbols := NewSymbols.
+ 		originalSymbolTable := SymbolTable.
+ 		newNewSymbols := WeakSet new.
+ 		newSymbolTable := WeakSet withAll: self allSubInstances.
+ 		newNewSymbols beReadOnlyObject.
+ 		newSymbolTable beReadOnlyObject.
+ 		originalNewSymbols == NewSymbols ifTrue: [
+ 			NewSymbols := newNewSymbols.
+ 			SymbolTable := newSymbolTable.
+ 			^self ].
+ 		"Some other process has modified the symbol table. Try again." ] repeat
+ !
- 	originalNewSymbols := NewSymbols.
- 	originalSymbolTable := SymbolTable.
- 	newNewSymbols := WeakSet new.
- 	newSymbolTable := WeakSet withAll: self allSubInstances.
- 	originalNewSymbols == NewSymbols ifFalse: [
- 		"Some other process has modified the symbols. Try again."
- 		^self rehash ].
- 	NewSymbols := newNewSymbols.
- 	SymbolTable := newSymbolTable!

Item was changed:
  ----- Method: WeakIdentityDictionary>>removeKey:ifAbsent: (in category 'removing') -----
  removeKey: key ifAbsent: aBlock 
  	"Remove key (and its associated value) from the receiver. If key is not in 
  	the receiver, answer the result of evaluating aBlock. Otherwise, answer 
  	the value externally named by key."
  
  	| index association |
  	index := self scanFor: key.
  	(association := (array at: index)) == vacuum ifTrue: [ ^aBlock value ].
+ 	tally := tally - 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: vacuum.
- 	tally := tally - 1.
  	self fixCollisionsFrom: index.
  	^association value!

Item was changed:
  ----- Method: WeakSet>>postCopy (in category 'copying') -----
  postCopy
+ 
  	| oldFlag |
  	super postCopy.
  	oldFlag := flag.
  	flag := Object new.
+ 	1 to: array size do: [ :index |
+ 		(array at: index) == oldFlag ifTrue: [
+ 			array at: index put: flag ] ]!
- 	array replaceAll: oldFlag with: flag.!

Item was changed:
  ----- Method: WeakSet>>remove:ifAbsent: (in category 'removing') -----
  remove: oldObject ifAbsent: aBlock
  
  	| index |
  	index := self scanFor: oldObject.
  	(array at: index) == flag ifTrue: [ ^ aBlock value ].
+ 	tally := tally - 1. "Update tally first, so that read-only hashed collections raise an error before modifying array."
  	array at: index put: flag.
- 	tally := tally - 1.
  	self fixCollisionsFrom: index.
  	^oldObject!

Item was changed:
+ (PackageInfo named: 'Collections') postscript: '"Make sure the symbol table consists of immutable sets"
+ #(SymbolTable NewSymbols) do: [ :variableName |
+ 	(Symbol classPool at: variableName) beReadOnlyObject ]'!
- (PackageInfo named: 'Collections') postscript: '"Definition of separators has been updated, update CharacterSet caches too"
- CharacterSet cleanUp: false.'!



More information about the Squeak-dev mailing list