[squeak-dev] The Inbox: Collections-nice.869.mcz

commits at source.squeak.org commits at source.squeak.org
Tue Dec 31 22:14:50 UTC 2019


Nicolas Cellier uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-nice.869.mcz

==================== Summary ====================

Name: Collections-nice.869
Author: nice
Time: 31 December 2019, 11:14:43.388306 pm
UUID: e5a9bef5-16d3-4d8a-bf08-040d30e4e907
Ancestors: Collections-nice.868

Remaster Collections-nice.464: opimize RunArray

This should not be noticeable for Text, but as a general library, it's important for any other potential use.

Among those optimizations, notice one important change: some of the enumerating methods like #collect: won't iterate on each element, but only once per run. Be aware that providing blocks with side effects like (i := i + 1) might lead to different behavior than ordinary ArrayedCollection.

Move RunArray off ArrayedCollection which serves nothing to such subclass.

Add the ability to remove: since it already has the hability to addFirst: and addLast:

Fix a few missing lastIndex cache flush, and advertise about the necessity to do it in class comment.

Deprecate mapValues: to the profit of replace:.

=============== Diff against Collections-nice.868 ===============

Item was changed:
+ SequenceableCollection subclass: #RunArray
- ArrayedCollection subclass: #RunArray
  	instanceVariableNames: 'runs values lastIndex lastRun lastOffset'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'Collections-Arrayed'!
  
+ !RunArray commentStamp: 'nice 12/30/2019 00:57' prior: 0!
- !RunArray commentStamp: '<historical>' prior: 0!
  My instances provide space-efficient storage of data which tends to be constant over long runs of the possible indices. Essentially repeated values are stored singly and then associated with a "run" length that denotes the number of consecutive occurrences of the value.
  
  My two important variables are
  	runs	An array of how many elements are in each run
  	values	An array of what the value is over those elements
  
  The variables lastIndex, lastRun and lastOffset cache the last access
  so that streaming through RunArrays is not an N-squared process.
+ Beware: methods modifying the RunArray contents should reset the lastIndex cache to nil.
  
  Many complexities of access can be bypassed by using the method
  	RunArray withStartStopAndValueDo:!

Item was changed:
  ----- Method: RunArray class>>newFrom: (in category 'instance creation') -----
  newFrom: aCollection 
  	"Answer an instance of me containing the same elements as aCollection."
  
  	| newCollection |
  	newCollection := self new.
+ 	newCollection fillFrom: aCollection with: [:each | each].
- 	aCollection do: [:x | newCollection addLast: x].
  	^newCollection
  
  "	RunArray newFrom: {1. 2. 2. 3}
  	{1. $a. $a. 3} as: RunArray
  	({1. $a. $a. 3} as: RunArray) values
  "!

Item was added:
+ ----- Method: RunArray>>allSatisfy: (in category 'enumerating') -----
+ allSatisfy: aBlock 
+ 	"Only evaluate once per run"
+ 
+ 	^values allSatisfy: aBlock!

Item was added:
+ ----- Method: RunArray>>anySatisfy: (in category 'enumerating') -----
+ anySatisfy: aBlock 
+ 	"Only evaluate once per run"
+ 
+ 	^values anySatisfy: aBlock!

Item was added:
+ ----- Method: RunArray>>asBag (in category 'converting') -----
+ asBag
+ 	| aBag |
+ 	aBag := Bag new: values size.
+ 	self runsAndValuesDo: [:run :value |
+ 		aBag add: value withOccurrences: run].
+ 	^aBag!

Item was added:
+ ----- Method: RunArray>>asSet (in category 'converting') -----
+ asSet
+ 	^values asSet!

Item was changed:
+ ----- Method: RunArray>>coalesce (in category 'private') -----
- ----- Method: RunArray>>coalesce (in category 'adding') -----
  coalesce
+ 	"Coalesce theRuns and theValues if ever the values have adjacent equal objects"
+ 
+ 	| lastLength lastValue mustCoalesce coalescedRuns coalescedValued runIndex |
+ 	mustCoalesce := false.
+ 	runIndex := 0.
+ 	lastLength := 0.
+ 	lastValue := Object new.		
+ 	runs with: values do: [:run :value | 
+ 		(lastValue = value or: [run = 0])
+ 			ifTrue:
+ 				[mustCoalesce
+ 					ifFalse:
+ 						[coalescedRuns := (Array new: runs size) writeStream.
+ 						coalescedValued := (Array new: values size) writeStream.
+ 						coalescedRuns next: runIndex putAll: runs startingAt: 1.
+ 						coalescedValued next: runIndex putAll: values startingAt: 1.
+ 						mustCoalesce := true].
+ 				lastLength := lastLength + run]
+ 			ifFalse:
+ 				[lastLength > 0
+ 					ifTrue:
+ 						[mustCoalesce
+ 							ifTrue:
+ 								[coalescedRuns nextPut: lastLength.
+ 								coalescedValued nextPut: lastValue].
+ 						runIndex := runIndex + 1].
+ 				lastLength := run.
+ 				lastValue := value]].
+ 	mustCoalesce
+ 		ifTrue:
+ 			[lastLength > 0
+ 				ifTrue:
+ 					[coalescedRuns nextPut: lastLength.
+ 					coalescedValued nextPut: lastValue].
+ 			self setRuns: coalescedRuns contents setValues: coalescedValued contents]!
- 	"Try to combine adjacent runs"
- 	| ind |
- 	ind := 2.
- 	[ind > values size] whileFalse: [
- 		(values at: ind-1) = (values at: ind) 
- 			ifFalse: [ind := ind + 1]
- 			ifTrue: ["two are the same, combine them"
- 				values := values copyReplaceFrom: ind to: ind with: #().
- 				runs at: ind-1 put: (runs at: ind-1) + (runs at: ind).
- 				runs := runs copyReplaceFrom: ind to: ind with: #().
- 				"self error: 'needed to combine runs' "]].
- 			!

Item was added:
+ ----- Method: RunArray>>collect: (in category 'enumerating') -----
+ collect: aBlock
+ 	"Beware, the block will be evaluated only once per group of values."
+ 	^(self class runs: runs copy values: (values collect: aBlock)) coalesce!

Item was added:
+ ----- Method: RunArray>>copyUpThrough: (in category 'copying') -----
+ copyUpThrough: value
+ 	"Optimized"
+ 
+ 	| newSize newValues newRuns |
+ 	newSize := values indexOf: value startingAt: 1.
+ 	newSize = 0 ifTrue: [^self copy].
+ 	newRuns := runs copyFrom: 1 to: newSize.
+ 	newRuns at: newSize put: 1.
+ 	newValues := values copyFrom: 1 to: newSize.
+ 	^ self class
+ 		runs: newRuns
+ 		values: newValues!

Item was added:
+ ----- Method: RunArray>>copyUpTo: (in category 'copying') -----
+ copyUpTo: anElement 
+ 	"Optimized"
+ 
+ 	| newValues |
+ 	newValues := values copyUpTo: anElement.
+ 	^ self class
+ 		runs: (runs copyFrom: 1 to: newValues size)
+ 		values: newValues!

Item was added:
+ ----- Method: RunArray>>copyUpToLast: (in category 'copying') -----
+ copyUpToLast: value
+ 	"Optimized"
+ 
+ 	| newSize run newRuns newValues |
+ 	newSize := values lastIndexOf: value startingAt: values size.
+ 	newSize = 0 ifTrue: [^self copy].
+ 	run := runs at: newSize.
+ 	run > 1
+ 		ifTrue:
+ 			[newRuns := runs copyFrom: 1 to: newSize.
+ 			newRuns at: newSize put: run - 1]
+ 		ifFalse:
+ 			[newSize := newSize - 1.
+ 			newRuns := runs copyFrom: 1 to: newSize].
+ 	newValues := values copyFrom: 1 to: newSize.
+ 	^ self class
+ 		runs: newRuns
+ 		values: newValues!

Item was added:
+ ----- Method: RunArray>>count: (in category 'enumerating') -----
+ count: aBlock
+ 	"Beware, the block will be evaluated only once per group of values."
+ 	| count |
+ 	count := 0.
+ 	self runsAndValuesDo: [:run :value |
+ 		(aBlock value: value)
+ 			ifTrue:
+ 				[count := count + run]].
+ 	^count!

Item was added:
+ ----- Method: RunArray>>detect:ifNone: (in category 'enumerating') -----
+ detect: aBlock ifNone: exceptionBlock 
+ 	"Only evaluate once per run"
+ 
+ 	^values detect: aBlock ifNone: exceptionBlock !

Item was added:
+ ----- Method: RunArray>>detectMax: (in category 'enumerating') -----
+ detectMax: aBlock
+ 	"Only evaluate once per run"
+ 
+ 	^values detectMax: aBlock!

Item was added:
+ ----- Method: RunArray>>detectMin: (in category 'enumerating') -----
+ detectMin: aBlock
+ 	"Only evaluate once per run"
+ 
+ 	^values detectMin: aBlock!

Item was added:
+ ----- Method: RunArray>>detectSum: (in category 'enumerating') -----
+ detectSum: aBlock
+ 	"Only loop on runs"
+ 	| sum |
+ 	sum := 0.
+ 	self runsAndValuesDo: [:run :value | 
+ 		sum := (aBlock value: value) * run + sum].  
+ 	^ sum!

Item was added:
+ ----- Method: RunArray>>do: (in category 'enumerating') -----
+ do: aBlock
+ 	"This is refined for speed"
+ 	
+ 	1 to: runs size do: [:i |
+ 		| r v |
+ 		v := values at: i.
+ 		r := runs at: i.
+ 		[( r := r - 1) >= 0]
+ 			whileTrue: [aBlock value: v]].!

Item was added:
+ ----- Method: RunArray>>fillFrom:with: (in category 'private') -----
+ fillFrom: aCollection with: aBlock
+ 	"Evaluate aBlock with each of aCollections's elements as the argument.  
+ 	Collect the resulting values into self. Answer self."
+ 
+ 	| newRuns newValues lastLength lastValue |
+ 	newRuns := (Array new: aCollection size) writeStream.
+ 	newValues := (Array new: aCollection size) writeStream.
+ 	lastLength := 0.
+ 	lastValue := Object new.
+ 	lastIndex := nil.  "flush access cache"
+ 	aCollection do: [:each | 
+ 		| value |
+ 		value := aBlock value: each.
+ 		lastValue = value
+ 			ifTrue: [lastLength := lastLength + 1]
+ 			ifFalse:
+ 				[lastLength > 0
+ 					ifTrue:
+ 						[newRuns nextPut: lastLength.
+ 						newValues nextPut: lastValue].
+ 				lastLength := 1.
+ 				lastValue := value]].
+ 	lastLength > 0
+ 		ifTrue:
+ 			[newRuns nextPut: lastLength.
+ 			newValues nextPut: lastValue].
+ 	self setRuns: newRuns contents setValues: newValues contents!

Item was added:
+ ----- Method: RunArray>>findFirst: (in category 'enumerating') -----
+ findFirst: aBlock
+ 	| index |
+ 	index := 1.
+ 	self runsAndValuesDo: [ :run :value |
+ 		(aBlock value: value) ifTrue: [^index].
+ 		index := index + run].
+ 	^0!

Item was added:
+ ----- Method: RunArray>>findLast: (in category 'enumerating') -----
+ findLast: aBlock
+ 	| index |
+ 	index := values size + 1.
+ 	[(index := index - 1) >= 1] whileTrue:
+ 		[(aBlock value: (values at: index)) ifTrue: [^(1 to: index) detectSum: [:i | runs at: i]]].
+ 	^0!

Item was added:
+ ----- Method: RunArray>>includes: (in category 'testing') -----
+ includes: anObject
+ 	"Answer whether anObject is one of the receiver's elements."
+ 
+ 	^values includes: anObject!

Item was added:
+ ----- Method: RunArray>>indexOf:startingAt: (in category 'accessing') -----
+ indexOf: anElement startingAt: start
+ 	"Answer the index of the first occurence of anElement after start
+ 	within the receiver. If the receiver does not contain anElement, 
+ 	answer 0."
+ 	
+ 	| index |
+ 	index := 1.
+ 	self runsAndValuesDo: [ :run :value |
+ 		(index >= start and: [value = anElement]) ifTrue: [^index].
+ 		index := index + run].
+ 	^0!

Item was added:
+ ----- Method: RunArray>>indexOfAnyOf:startingAt: (in category 'accessing') -----
+ indexOfAnyOf: aCollection startingAt: start
+ 	"Answer the index of the first occurence of any element included in aCollection after start within the receiver.
+ 	If the receiver does not contain anElement, answer zero, which is an invalid index."
+ 	
+ 	| index |
+ 	index := 1.
+ 	self runsAndValuesDo: [ :run :value |
+ 		(index >= start and: [aCollection includes: value]) ifTrue: [^index].
+ 		index := index + run].
+ 	^0!

Item was added:
+ ----- Method: RunArray>>isSorted (in category 'testing') -----
+ isSorted
+ 	^values isSorted!

Item was added:
+ ----- Method: RunArray>>isSortedBy: (in category 'testing') -----
+ isSortedBy: aBlock
+ 	^values isSortedBy: aBlock!

Item was added:
+ ----- Method: RunArray>>lastIndexOf:startingAt: (in category 'accessing') -----
+ lastIndexOf: anElement startingAt: lastIndex
+ 	"Answer the index of the last occurence of anElement within the  
+ 	receiver. If the receiver does not contain anElement, answer 0."
+ 	
+ 	| lastValueIndex |
+ 	lastValueIndex := values lastIndexOf: anElement startingAt: values size.
+ 	[lastValueIndex > 0] whileTrue:
+ 		[| i index |
+ 		i := index := 0.
+ 		[index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
+ 				whileTrue: [index := index + (runs at: i)].
+ 		index <= lastIndex ifTrue: [^index].
+ 		index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
+ 		lastValueIndex := values lastIndexOf: anElement startingAt: lastValueIndex - 1].
+ 	^0!

Item was added:
+ ----- Method: RunArray>>lastIndexOfAnyOf:startingAt: (in category 'accessing') -----
+ lastIndexOfAnyOf: aCollection startingAt: lastIndex
+ 	"Answer the index of the last occurence of any element of aCollection
+ 	within the receiver. If the receiver does not contain any of those
+ 	elements, answer 0"
+ 	
+ 	| lastValueIndex |
+ 	lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt: values size.
+ 	[lastValueIndex > 0] whileTrue:
+ 		[| i index |
+ 		i := index := 0.
+ 		[index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
+ 				whileTrue: [index := index + (runs at: i)].
+ 		index <= lastIndex ifTrue: [^index].
+ 		index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
+ 		lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt: lastValueIndex - 1].
+ 	^0!

Item was removed:
- ----- Method: RunArray>>mapValues: (in category 'private') -----
- mapValues: mapBlock
- 	"NOTE: only meaningful to an entire set of runs"
- 	
- 	values := values collect: mapBlock!

Item was added:
+ ----- Method: RunArray>>noneSatisfy: (in category 'enumerating') -----
+ noneSatisfy: aBlock 
+ 	"Only evaluate once per run"
+ 
+ 	^values noneSatisfy: aBlock!

Item was changed:
+ ----- Method: RunArray>>rangeOf:startingAt: (in category 'accessing') -----
- ----- Method: RunArray>>rangeOf:startingAt: (in category 'adding') -----
  rangeOf: attr startingAt: startPos
  	"Answer an interval that gives the range of attr at index position  startPos. An empty interval with start value startPos is returned when the attribute attr is not present at position startPos.  self size > 0 is assumed, it is the responsibility of the caller to test for emptiness of self.
  Note that an attribute may span several adjancent runs. "
  
  	self at: startPos 
  		setRunOffsetAndValue: 
              [:run :offset :value | 
                 ^(value includes: attr)
                    ifFalse: [startPos to: startPos - 1]
                    ifTrue:
                      [ | firstRelevantPosition lastRelevantPosition idxOfCandidateRun |
                       lastRelevantPosition := startPos - offset + (runs at: run) - 1.
                       firstRelevantPosition := startPos - offset.
                       idxOfCandidateRun := run + 1.
                       [idxOfCandidateRun <= runs size 
                               and: [(values at: idxOfCandidateRun) includes: attr]]
                          whileTrue:
                            [lastRelevantPosition := lastRelevantPosition + (runs at: idxOfCandidateRun).
                             idxOfCandidateRun := idxOfCandidateRun + 1]. 
                       idxOfCandidateRun := run - 1.
                       [idxOfCandidateRun >= 1 
                               and: [(values at: idxOfCandidateRun) includes: attr]]
                          whileTrue:
                            [firstRelevantPosition := firstRelevantPosition - (runs at: idxOfCandidateRun).
                             idxOfCandidateRun := idxOfCandidateRun - 1]. 
   
                      firstRelevantPosition to: lastRelevantPosition]
  		  ]!

Item was added:
+ ----- Method: RunArray>>remove:ifAbsent: (in category 'removing') -----
+ remove: anObject ifAbsent: exceptionBlock
+ 	| index mustCoalesce run |
+ 	lastIndex := nil.  "flush access cache"
+ 	index := values indexOf: anObject ifAbsent: [^exceptionBlock value].
+ 	(run := runs at: index) > 1
+ 		ifTrue: [runs at: index put: run - 1]
+ 		ifFalse:
+ 			[mustCoalesce := index > 1 and: [index < values size and: [(values at: index - 1) = (values at: index + 1)]].
+ 			runs := runs copyWithoutIndex: index.
+ 			values := values copyWithoutIndex: index.
+ 			mustCoalesce
+ 				ifTrue:
+ 					[runs at: index - 1 put: (runs at: index - 1) + (runs at: index).
+ 					runs := runs copyWithoutIndex: index.
+ 					values := values copyWithoutIndex: index]].
+ 	^anObject!

Item was added:
+ ----- Method: RunArray>>removeAll (in category 'removing') -----
+ removeAll
+ 	lastIndex := nil.  "flush access cache"
+ 	runs := Array new.
+ 	values := Array new!

Item was added:
+ ----- Method: RunArray>>replace: (in category 'enumerating') -----
+ replace: aBlock
+ 	"destructively replace the values in this RunArray with the ones transformed by aBlock."
+ 	lastIndex := nil.  "flush access cache"
+ 	values := values replace: aBlock.
+ 	self coalesce!

Item was added:
+ ----- Method: RunArray>>reverseDo: (in category 'enumerating') -----
+ reverseDo: aBlock
+ 	"This is refined for speed"
+ 	
+ 	| i |
+ 	i := runs size.
+ 	[i > 0]
+ 		whileTrue: 
+ 			[ | r v |
+ 			v := values at: i.
+ 			r := runs at: i.
+ 			i := i - 1.
+ 			[( r := r - 1) >= 0]
+ 				whileTrue: [aBlock value: v]].!

Item was added:
+ ----- Method: RunArray>>select: (in category 'enumerating') -----
+ select: aBlock
+ 	"Beware, the block will be evaluated only once per group of values."
+ 	| newRuns newValues |
+ 	newRuns := (Array new: runs size) writeStream.
+ 	newValues := (Array new: values size) writeStream.
+ 	self runsAndValuesDo: [:run :value |
+ 		(aBlock value: value)
+ 			ifTrue:
+ 				[newRuns nextPut: run.
+ 				newValues nextPut: value]].
+ 	^(self class runs: newRuns contents values: newValues contents) coalesce!

Item was changed:
  ----- Method: SequenceableCollection>>lastIndexOf:startingAt: (in category 'accessing') -----
  lastIndexOf: anElement startingAt: lastIndex
  	"Answer the index of the last occurence of anElement within the  
+ 	receiver. If the receiver does not contain anElement, answer 0."
- 	receiver. If the receiver does not contain anElement, answer the
- 	result of evaluating the argument, exceptionBlock."
  
  	lastIndex to: 1 by: -1 do: [ :index |
  		(self at: index) = anElement ifTrue: [ ^index ] ].
  	^0!

Item was changed:
  ----- Method: Text>>addAttribute:from:to: (in category 'emphasis') -----
  addAttribute: att from: start to: stop 
  	"Set the attribute for characters in the interval start to stop."
  	runs :=  runs copyReplaceFrom: start to: stop
  			with: ((runs copyFrom: start to: stop)
+ 				replace:
- 				mapValues:
  				[:attributes | Text addAttribute: att toArray: attributes])
  !

Item was changed:
  ----- Method: Text>>removeAttribute:from:to: (in category 'emphasis') -----
  removeAttribute: att from: start to: stop 
  	"Remove the attribute over the interval start to stop."
  	runs :=  runs copyReplaceFrom: start to: stop
  			with: ((runs copyFrom: start to: stop)
+ 				replace:
- 				mapValues:
  				[:attributes | attributes copyWithout: att])
  !



More information about the Squeak-dev mailing list