[squeak-dev] The Inbox: Collections-nice.869.mcz

marcel.taeumel Marcel.Taeumel at hpi.de
Thu Dec 17 11:48:32 UTC 2020


Hi all!

I am in favor of improving RunArray. Treating RunArray as "array with
run-length encoding" breaks information hiding by expecting a specific
implementation strategy.

RunArray feels like Bag, but ordered. Maybe "Collections-Abstract" deserves
a new direct subclass? CompressedCollection? And then Bag and OrderedBag are
two examples for such compression?

We have already "Collections-Support", which lists several support classes
such as "Bitset" and "CharacterSet". Such classes do not comply with the
usual collection protocol anyway. For example, there is CharacterSet >>
#canBeEnumerated. Maybe RunArray can go there and also help OrderedBag come
to be? :-) Then, RunArray may also have its special semantics.

Best,
Marcel


commits-2 wrote
> Nicolas Cellier uploaded a new version of Collections to project The
> Inbox:
> http://source.squeak.org/inbox/Collections-nice.869.mcz
> 
> ==================== Summary ====================
> 
> Name: Collections-nice.869
> Author: nice
> Time: 31 December 2019, 11:14:43.388306 pm
> UUID: e5a9bef5-16d3-4d8a-bf08-040d30e4e907
> Ancestors: Collections-nice.868
> 
> Remaster Collections-nice.464: opimize RunArray
> 
> This should not be noticeable for Text, but as a general library, it's
> important for any other potential use.
> 
> Among those optimizations, notice one important change: some of the
> enumerating methods like #collect: won't iterate on each element, but only
> once per run. Be aware that providing blocks with side effects like (i :=
> i + 1) might lead to different behavior than ordinary ArrayedCollection.
> 
> Move RunArray off ArrayedCollection which serves nothing to such subclass.
> 
> Add the ability to remove: since it already has the hability to addFirst:
> and addLast:
> 
> Fix a few missing lastIndex cache flush, and advertise about the necessity
> to do it in class comment.
> 
> Deprecate mapValues: to the profit of replace:.
> 
> =============== Diff against Collections-nice.868 ===============
> 
> Item was changed:
> + SequenceableCollection subclass: #RunArray
> - ArrayedCollection subclass: #RunArray
>   	instanceVariableNames: 'runs values lastIndex lastRun lastOffset'
>   	classVariableNames: ''
>   	poolDictionaries: ''
>   	category: 'Collections-Arrayed'!
>   
> + !RunArray commentStamp: 'nice 12/30/2019 00:57' prior: 0!
> - !RunArray commentStamp: '
> <historical>
> ' prior: 0!
>   My instances provide space-efficient storage of data which tends to be
> constant over long runs of the possible indices. Essentially repeated
> values are stored singly and then associated with a "run" length that
> denotes the number of consecutive occurrences of the value.
>   
>   My two important variables are
>   	runs	An array of how many elements are in each run
>   	values	An array of what the value is over those elements
>   
>   The variables lastIndex, lastRun and lastOffset cache the last access
>   so that streaming through RunArrays is not an N-squared process.
> + Beware: methods modifying the RunArray contents should reset the
> lastIndex cache to nil.
>   
>   Many complexities of access can be bypassed by using the method
>   	RunArray withStartStopAndValueDo:!
> 
> Item was changed:
>   ----- Method: RunArray class>>newFrom: (in category 'instance creation')
> -----
>   newFrom: aCollection 
>   	"Answer an instance of me containing the same elements as aCollection."
>   
>   	| newCollection |
>   	newCollection := self new.
> + 	newCollection fillFrom: aCollection with: [:each | each].
> - 	aCollection do: [:x | newCollection addLast: x].
>   	^newCollection
>   
>   "	RunArray newFrom: {1. 2. 2. 3}
>   	{1. $a. $a. 3} as: RunArray
>   	({1. $a. $a. 3} as: RunArray) values
>   "!
> 
> Item was added:
> + ----- Method: RunArray>>allSatisfy: (in category 'enumerating') -----
> + allSatisfy: aBlock 
> + 	"Only evaluate once per run"
> + 
> + 	^values allSatisfy: aBlock!
> 
> Item was added:
> + ----- Method: RunArray>>anySatisfy: (in category 'enumerating') -----
> + anySatisfy: aBlock 
> + 	"Only evaluate once per run"
> + 
> + 	^values anySatisfy: aBlock!
> 
> Item was added:
> + ----- Method: RunArray>>asBag (in category 'converting') -----
> + asBag
> + 	| aBag |
> + 	aBag := Bag new: values size.
> + 	self runsAndValuesDo: [:run :value |
> + 		aBag add: value withOccurrences: run].
> + 	^aBag!
> 
> Item was added:
> + ----- Method: RunArray>>asSet (in category 'converting') -----
> + asSet
> + 	^values asSet!
> 
> Item was changed:
> + ----- Method: RunArray>>coalesce (in category 'private') -----
> - ----- Method: RunArray>>coalesce (in category 'adding') -----
>   coalesce
> + 	"Coalesce theRuns and theValues if ever the values have adjacent equal
> objects"
> + 
> + 	| lastLength lastValue mustCoalesce coalescedRuns coalescedValued
> runIndex |
> + 	mustCoalesce := false.
> + 	runIndex := 0.
> + 	lastLength := 0.
> + 	lastValue := Object new.		
> + 	runs with: values do: [:run :value | 
> + 		(lastValue = value or: [run = 0])
> + 			ifTrue:
> + 				[mustCoalesce
> + 					ifFalse:
> + 						[coalescedRuns := (Array new: runs size) writeStream.
> + 						coalescedValued := (Array new: values size) writeStream.
> + 						coalescedRuns next: runIndex putAll: runs startingAt: 1.
> + 						coalescedValued next: runIndex putAll: values startingAt: 1.
> + 						mustCoalesce := true].
> + 				lastLength := lastLength + run]
> + 			ifFalse:
> + 				[lastLength > 0
> + 					ifTrue:
> + 						[mustCoalesce
> + 							ifTrue:
> + 								[coalescedRuns nextPut: lastLength.
> + 								coalescedValued nextPut: lastValue].
> + 						runIndex := runIndex + 1].
> + 				lastLength := run.
> + 				lastValue := value]].
> + 	mustCoalesce
> + 		ifTrue:
> + 			[lastLength > 0
> + 				ifTrue:
> + 					[coalescedRuns nextPut: lastLength.
> + 					coalescedValued nextPut: lastValue].
> + 			self setRuns: coalescedRuns contents setValues: coalescedValued
> contents]!
> - 	"Try to combine adjacent runs"
> - 	| ind |
> - 	ind := 2.
> - 	[ind > values size] whileFalse: [
> - 		(values at: ind-1) = (values at: ind) 
> - 			ifFalse: [ind := ind + 1]
> - 			ifTrue: ["two are the same, combine them"
> - 				values := values copyReplaceFrom: ind to: ind with: #().
> - 				runs at: ind-1 put: (runs at: ind-1) + (runs at: ind).
> - 				runs := runs copyReplaceFrom: ind to: ind with: #().
> - 				"self error: 'needed to combine runs' "]].
> - 			!
> 
> Item was added:
> + ----- Method: RunArray>>collect: (in category 'enumerating') -----
> + collect: aBlock
> + 	"Beware, the block will be evaluated only once per group of values."
> + 	^(self class runs: runs copy values: (values collect: aBlock))
> coalesce!
> 
> Item was added:
> + ----- Method: RunArray>>copyUpThrough: (in category 'copying') -----
> + copyUpThrough: value
> + 	"Optimized"
> + 
> + 	| newSize newValues newRuns |
> + 	newSize := values indexOf: value startingAt: 1.
> + 	newSize = 0 ifTrue: [^self copy].
> + 	newRuns := runs copyFrom: 1 to: newSize.
> + 	newRuns at: newSize put: 1.
> + 	newValues := values copyFrom: 1 to: newSize.
> + 	^ self class
> + 		runs: newRuns
> + 		values: newValues!
> 
> Item was added:
> + ----- Method: RunArray>>copyUpTo: (in category 'copying') -----
> + copyUpTo: anElement 
> + 	"Optimized"
> + 
> + 	| newValues |
> + 	newValues := values copyUpTo: anElement.
> + 	^ self class
> + 		runs: (runs copyFrom: 1 to: newValues size)
> + 		values: newValues!
> 
> Item was added:
> + ----- Method: RunArray>>copyUpToLast: (in category 'copying') -----
> + copyUpToLast: value
> + 	"Optimized"
> + 
> + 	| newSize run newRuns newValues |
> + 	newSize := values lastIndexOf: value startingAt: values size.
> + 	newSize = 0 ifTrue: [^self copy].
> + 	run := runs at: newSize.
> + 	run > 1
> + 		ifTrue:
> + 			[newRuns := runs copyFrom: 1 to: newSize.
> + 			newRuns at: newSize put: run - 1]
> + 		ifFalse:
> + 			[newSize := newSize - 1.
> + 			newRuns := runs copyFrom: 1 to: newSize].
> + 	newValues := values copyFrom: 1 to: newSize.
> + 	^ self class
> + 		runs: newRuns
> + 		values: newValues!
> 
> Item was added:
> + ----- Method: RunArray>>count: (in category 'enumerating') -----
> + count: aBlock
> + 	"Beware, the block will be evaluated only once per group of values."
> + 	| count |
> + 	count := 0.
> + 	self runsAndValuesDo: [:run :value |
> + 		(aBlock value: value)
> + 			ifTrue:
> + 				[count := count + run]].
> + 	^count!
> 
> Item was added:
> + ----- Method: RunArray>>detect:ifNone: (in category 'enumerating') -----
> + detect: aBlock ifNone: exceptionBlock 
> + 	"Only evaluate once per run"
> + 
> + 	^values detect: aBlock ifNone: exceptionBlock !
> 
> Item was added:
> + ----- Method: RunArray>>detectMax: (in category 'enumerating') -----
> + detectMax: aBlock
> + 	"Only evaluate once per run"
> + 
> + 	^values detectMax: aBlock!
> 
> Item was added:
> + ----- Method: RunArray>>detectMin: (in category 'enumerating') -----
> + detectMin: aBlock
> + 	"Only evaluate once per run"
> + 
> + 	^values detectMin: aBlock!
> 
> Item was added:
> + ----- Method: RunArray>>detectSum: (in category 'enumerating') -----
> + detectSum: aBlock
> + 	"Only loop on runs"
> + 	| sum |
> + 	sum := 0.
> + 	self runsAndValuesDo: [:run :value | 
> + 		sum := (aBlock value: value) * run + sum].  
> + 	^ sum!
> 
> Item was added:
> + ----- Method: RunArray>>do: (in category 'enumerating') -----
> + do: aBlock
> + 	"This is refined for speed"
> + 	
> + 	1 to: runs size do: [:i |
> + 		| r v |
> + 		v := values at: i.
> + 		r := runs at: i.
> + 		[( r := r - 1) >= 0]
> + 			whileTrue: [aBlock value: v]].!
> 
> Item was added:
> + ----- Method: RunArray>>fillFrom:with: (in category 'private') -----
> + fillFrom: aCollection with: aBlock
> + 	"Evaluate aBlock with each of aCollections's elements as the argument.  
> + 	Collect the resulting values into self. Answer self."
> + 
> + 	| newRuns newValues lastLength lastValue |
> + 	newRuns := (Array new: aCollection size) writeStream.
> + 	newValues := (Array new: aCollection size) writeStream.
> + 	lastLength := 0.
> + 	lastValue := Object new.
> + 	lastIndex := nil.  "flush access cache"
> + 	aCollection do: [:each | 
> + 		| value |
> + 		value := aBlock value: each.
> + 		lastValue = value
> + 			ifTrue: [lastLength := lastLength + 1]
> + 			ifFalse:
> + 				[lastLength > 0
> + 					ifTrue:
> + 						[newRuns nextPut: lastLength.
> + 						newValues nextPut: lastValue].
> + 				lastLength := 1.
> + 				lastValue := value]].
> + 	lastLength > 0
> + 		ifTrue:
> + 			[newRuns nextPut: lastLength.
> + 			newValues nextPut: lastValue].
> + 	self setRuns: newRuns contents setValues: newValues contents!
> 
> Item was added:
> + ----- Method: RunArray>>findFirst: (in category 'enumerating') -----
> + findFirst: aBlock
> + 	| index |
> + 	index := 1.
> + 	self runsAndValuesDo: [ :run :value |
> + 		(aBlock value: value) ifTrue: [^index].
> + 		index := index + run].
> + 	^0!
> 
> Item was added:
> + ----- Method: RunArray>>findLast: (in category 'enumerating') -----
> + findLast: aBlock
> + 	| index |
> + 	index := values size + 1.
> + 	[(index := index - 1) >= 1] whileTrue:
> + 		[(aBlock value: (values at: index)) ifTrue: [^(1 to: index) detectSum:
> [:i | runs at: i]]].
> + 	^0!
> 
> Item was added:
> + ----- Method: RunArray>>includes: (in category 'testing') -----
> + includes: anObject
> + 	"Answer whether anObject is one of the receiver's elements."
> + 
> + 	^values includes: anObject!
> 
> Item was added:
> + ----- Method: RunArray>>indexOf:startingAt: (in category 'accessing')
> -----
> + indexOf: anElement startingAt: start
> + 	"Answer the index of the first occurence of anElement after start
> + 	within the receiver. If the receiver does not contain anElement, 
> + 	answer 0."
> + 	
> + 	| index |
> + 	index := 1.
> + 	self runsAndValuesDo: [ :run :value |
> + 		(index >= start and: [value = anElement]) ifTrue: [^index].
> + 		index := index + run].
> + 	^0!
> 
> Item was added:
> + ----- Method: RunArray>>indexOfAnyOf:startingAt: (in category
> 'accessing') -----
> + indexOfAnyOf: aCollection startingAt: start
> + 	"Answer the index of the first occurence of any element included in
> aCollection after start within the receiver.
> + 	If the receiver does not contain anElement, answer zero, which is an
> invalid index."
> + 	
> + 	| index |
> + 	index := 1.
> + 	self runsAndValuesDo: [ :run :value |
> + 		(index >= start and: [aCollection includes: value]) ifTrue: [^index].
> + 		index := index + run].
> + 	^0!
> 
> Item was added:
> + ----- Method: RunArray>>isSorted (in category 'testing') -----
> + isSorted
> + 	^values isSorted!
> 
> Item was added:
> + ----- Method: RunArray>>isSortedBy: (in category 'testing') -----
> + isSortedBy: aBlock
> + 	^values isSortedBy: aBlock!
> 
> Item was added:
> + ----- Method: RunArray>>lastIndexOf:startingAt: (in category
> 'accessing') -----
> + lastIndexOf: anElement startingAt: lastIndex
> + 	"Answer the index of the last occurence of anElement within the  
> + 	receiver. If the receiver does not contain anElement, answer 0."
> + 	
> + 	| lastValueIndex |
> + 	lastValueIndex := values lastIndexOf: anElement startingAt: values
> size.
> + 	[lastValueIndex > 0] whileTrue:
> + 		[| i index |
> + 		i := index := 0.
> + 		[index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
> + 				whileTrue: [index := index + (runs at: i)].
> + 		index <= lastIndex ifTrue: [^index].
> + 		index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
> + 		lastValueIndex := values lastIndexOf: anElement startingAt:
> lastValueIndex - 1].
> + 	^0!
> 
> Item was added:
> + ----- Method: RunArray>>lastIndexOfAnyOf:startingAt: (in category
> 'accessing') -----
> + lastIndexOfAnyOf: aCollection startingAt: lastIndex
> + 	"Answer the index of the last occurence of any element of aCollection
> + 	within the receiver. If the receiver does not contain any of those
> + 	elements, answer 0"
> + 	
> + 	| lastValueIndex |
> + 	lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt:
> values size.
> + 	[lastValueIndex > 0] whileTrue:
> + 		[| i index |
> + 		i := index := 0.
> + 		[index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
> + 				whileTrue: [index := index + (runs at: i)].
> + 		index <= lastIndex ifTrue: [^index].
> + 		index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
> + 		lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt:
> lastValueIndex - 1].
> + 	^0!
> 
> Item was removed:
> - ----- Method: RunArray>>mapValues: (in category 'private') -----
> - mapValues: mapBlock
> - 	"NOTE: only meaningful to an entire set of runs"
> - 	
> - 	values := values collect: mapBlock!
> 
> Item was added:
> + ----- Method: RunArray>>noneSatisfy: (in category 'enumerating') -----
> + noneSatisfy: aBlock 
> + 	"Only evaluate once per run"
> + 
> + 	^values noneSatisfy: aBlock!
> 
> Item was changed:
> + ----- Method: RunArray>>rangeOf:startingAt: (in category 'accessing')
> -----
> - ----- Method: RunArray>>rangeOf:startingAt: (in category 'adding') -----
>   rangeOf: attr startingAt: startPos
>   	"Answer an interval that gives the range of attr at index position 
> startPos. An empty interval with start value startPos is returned when the
> attribute attr is not present at position startPos.  self size > 0 is
> assumed, it is the responsibility of the caller to test for emptiness of
> self.
>   Note that an attribute may span several adjancent runs. "
>   
>   	self at: startPos 
>   		setRunOffsetAndValue: 
>               [:run :offset :value | 
>                  ^(value includes: attr)
>                     ifFalse: [startPos to: startPos - 1]
>                     ifTrue:
>                       [ | firstRelevantPosition lastRelevantPosition
> idxOfCandidateRun |
>                        lastRelevantPosition := startPos - offset + (runs
> at: run) - 1.
>                        firstRelevantPosition := startPos - offset.
>                        idxOfCandidateRun := run + 1.
>                        [idxOfCandidateRun <= runs size 
>                                and: [(values at: idxOfCandidateRun)
> includes: attr]]
>                           whileTrue:
>                             [lastRelevantPosition := lastRelevantPosition
> + (runs at: idxOfCandidateRun).
>                              idxOfCandidateRun := idxOfCandidateRun + 1]. 
>                        idxOfCandidateRun := run - 1.
>                        [idxOfCandidateRun >= 1 
>                                and: [(values at: idxOfCandidateRun)
> includes: attr]]
>                           whileTrue:
>                             [firstRelevantPosition :=
> firstRelevantPosition - (runs at: idxOfCandidateRun).
>                              idxOfCandidateRun := idxOfCandidateRun - 1]. 
>    
>                       firstRelevantPosition to: lastRelevantPosition]
>   		  ]!
> 
> Item was added:
> + ----- Method: RunArray>>remove:ifAbsent: (in category 'removing') -----
> + remove: anObject ifAbsent: exceptionBlock
> + 	| index mustCoalesce run |
> + 	lastIndex := nil.  "flush access cache"
> + 	index := values indexOf: anObject ifAbsent: [^exceptionBlock value].
> + 	(run := runs at: index) > 1
> + 		ifTrue: [runs at: index put: run - 1]
> + 		ifFalse:
> + 			[mustCoalesce := index > 1 and: [index < values size and: [(values
> at: index - 1) = (values at: index + 1)]].
> + 			runs := runs copyWithoutIndex: index.
> + 			values := values copyWithoutIndex: index.
> + 			mustCoalesce
> + 				ifTrue:
> + 					[runs at: index - 1 put: (runs at: index - 1) + (runs at: index).
> + 					runs := runs copyWithoutIndex: index.
> + 					values := values copyWithoutIndex: index]].
> + 	^anObject!
> 
> Item was added:
> + ----- Method: RunArray>>removeAll (in category 'removing') -----
> + removeAll
> + 	lastIndex := nil.  "flush access cache"
> + 	runs := Array new.
> + 	values := Array new!
> 
> Item was added:
> + ----- Method: RunArray>>replace: (in category 'enumerating') -----
> + replace: aBlock
> + 	"destructively replace the values in this RunArray with the ones
> transformed by aBlock."
> + 	lastIndex := nil.  "flush access cache"
> + 	values := values replace: aBlock.
> + 	self coalesce!
> 
> Item was added:
> + ----- Method: RunArray>>reverseDo: (in category 'enumerating') -----
> + reverseDo: aBlock
> + 	"This is refined for speed"
> + 	
> + 	| i |
> + 	i := runs size.
> + 	[i > 0]
> + 		whileTrue: 
> + 			[ | r v |
> + 			v := values at: i.
> + 			r := runs at: i.
> + 			i := i - 1.
> + 			[( r := r - 1) >= 0]
> + 				whileTrue: [aBlock value: v]].!
> 
> Item was added:
> + ----- Method: RunArray>>select: (in category 'enumerating') -----
> + select: aBlock
> + 	"Beware, the block will be evaluated only once per group of values."
> + 	| newRuns newValues |
> + 	newRuns := (Array new: runs size) writeStream.
> + 	newValues := (Array new: values size) writeStream.
> + 	self runsAndValuesDo: [:run :value |
> + 		(aBlock value: value)
> + 			ifTrue:
> + 				[newRuns nextPut: run.
> + 				newValues nextPut: value]].
> + 	^(self class runs: newRuns contents values: newValues contents)
> coalesce!
> 
> Item was changed:
>   ----- Method: SequenceableCollection>>lastIndexOf:startingAt: (in
> category 'accessing') -----
>   lastIndexOf: anElement startingAt: lastIndex
>   	"Answer the index of the last occurence of anElement within the  
> + 	receiver. If the receiver does not contain anElement, answer 0."
> - 	receiver. If the receiver does not contain anElement, answer the
> - 	result of evaluating the argument, exceptionBlock."
>   
>   	lastIndex to: 1 by: -1 do: [ :index |
>   		(self at: index) = anElement ifTrue: [ ^index ] ].
>   	^0!
> 
> Item was changed:
>   ----- Method: Text>>addAttribute:from:to: (in category 'emphasis') -----
>   addAttribute: att from: start to: stop 
>   	"Set the attribute for characters in the interval start to stop."
>   	runs :=  runs copyReplaceFrom: start to: stop
>   			with: ((runs copyFrom: start to: stop)
> + 				replace:
> - 				mapValues:
>   				[:attributes | Text addAttribute: att toArray: attributes])
>   !
> 
> Item was changed:
>   ----- Method: Text>>removeAttribute:from:to: (in category 'emphasis')
> -----
>   removeAttribute: att from: start to: stop 
>   	"Remove the attribute over the interval start to stop."
>   	runs :=  runs copyReplaceFrom: start to: stop
>   			with: ((runs copyFrom: start to: stop)
> + 				replace:
> - 				mapValues:
>   				[:attributes | attributes copyWithout: att])
>   !





--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html


More information about the Squeak-dev mailing list