[squeak-dev] The Inbox: Collections-nice.869.mcz
marcel.taeumel
Marcel.Taeumel at hpi.de
Thu Dec 17 11:48:32 UTC 2020
Hi all!
I am in favor of improving RunArray. Treating RunArray as "array with
run-length encoding" breaks information hiding by expecting a specific
implementation strategy.
RunArray feels like Bag, but ordered. Maybe "Collections-Abstract" deserves
a new direct subclass? CompressedCollection? And then Bag and OrderedBag are
two examples for such compression?
We have already "Collections-Support", which lists several support classes
such as "Bitset" and "CharacterSet". Such classes do not comply with the
usual collection protocol anyway. For example, there is CharacterSet >>
#canBeEnumerated. Maybe RunArray can go there and also help OrderedBag come
to be? :-) Then, RunArray may also have its special semantics.
Best,
Marcel
commits-2 wrote
> Nicolas Cellier uploaded a new version of Collections to project The
> Inbox:
> http://source.squeak.org/inbox/Collections-nice.869.mcz
>
> ==================== Summary ====================
>
> Name: Collections-nice.869
> Author: nice
> Time: 31 December 2019, 11:14:43.388306 pm
> UUID: e5a9bef5-16d3-4d8a-bf08-040d30e4e907
> Ancestors: Collections-nice.868
>
> Remaster Collections-nice.464: opimize RunArray
>
> This should not be noticeable for Text, but as a general library, it's
> important for any other potential use.
>
> Among those optimizations, notice one important change: some of the
> enumerating methods like #collect: won't iterate on each element, but only
> once per run. Be aware that providing blocks with side effects like (i :=
> i + 1) might lead to different behavior than ordinary ArrayedCollection.
>
> Move RunArray off ArrayedCollection which serves nothing to such subclass.
>
> Add the ability to remove: since it already has the hability to addFirst:
> and addLast:
>
> Fix a few missing lastIndex cache flush, and advertise about the necessity
> to do it in class comment.
>
> Deprecate mapValues: to the profit of replace:.
>
> =============== Diff against Collections-nice.868 ===============
>
> Item was changed:
> + SequenceableCollection subclass: #RunArray
> - ArrayedCollection subclass: #RunArray
> instanceVariableNames: 'runs values lastIndex lastRun lastOffset'
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Collections-Arrayed'!
>
> + !RunArray commentStamp: 'nice 12/30/2019 00:57' prior: 0!
> - !RunArray commentStamp: '
> <historical>
> ' prior: 0!
> My instances provide space-efficient storage of data which tends to be
> constant over long runs of the possible indices. Essentially repeated
> values are stored singly and then associated with a "run" length that
> denotes the number of consecutive occurrences of the value.
>
> My two important variables are
> runs An array of how many elements are in each run
> values An array of what the value is over those elements
>
> The variables lastIndex, lastRun and lastOffset cache the last access
> so that streaming through RunArrays is not an N-squared process.
> + Beware: methods modifying the RunArray contents should reset the
> lastIndex cache to nil.
>
> Many complexities of access can be bypassed by using the method
> RunArray withStartStopAndValueDo:!
>
> Item was changed:
> ----- Method: RunArray class>>newFrom: (in category 'instance creation')
> -----
> newFrom: aCollection
> "Answer an instance of me containing the same elements as aCollection."
>
> | newCollection |
> newCollection := self new.
> + newCollection fillFrom: aCollection with: [:each | each].
> - aCollection do: [:x | newCollection addLast: x].
> ^newCollection
>
> " RunArray newFrom: {1. 2. 2. 3}
> {1. $a. $a. 3} as: RunArray
> ({1. $a. $a. 3} as: RunArray) values
> "!
>
> Item was added:
> + ----- Method: RunArray>>allSatisfy: (in category 'enumerating') -----
> + allSatisfy: aBlock
> + "Only evaluate once per run"
> +
> + ^values allSatisfy: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>anySatisfy: (in category 'enumerating') -----
> + anySatisfy: aBlock
> + "Only evaluate once per run"
> +
> + ^values anySatisfy: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>asBag (in category 'converting') -----
> + asBag
> + | aBag |
> + aBag := Bag new: values size.
> + self runsAndValuesDo: [:run :value |
> + aBag add: value withOccurrences: run].
> + ^aBag!
>
> Item was added:
> + ----- Method: RunArray>>asSet (in category 'converting') -----
> + asSet
> + ^values asSet!
>
> Item was changed:
> + ----- Method: RunArray>>coalesce (in category 'private') -----
> - ----- Method: RunArray>>coalesce (in category 'adding') -----
> coalesce
> + "Coalesce theRuns and theValues if ever the values have adjacent equal
> objects"
> +
> + | lastLength lastValue mustCoalesce coalescedRuns coalescedValued
> runIndex |
> + mustCoalesce := false.
> + runIndex := 0.
> + lastLength := 0.
> + lastValue := Object new.
> + runs with: values do: [:run :value |
> + (lastValue = value or: [run = 0])
> + ifTrue:
> + [mustCoalesce
> + ifFalse:
> + [coalescedRuns := (Array new: runs size) writeStream.
> + coalescedValued := (Array new: values size) writeStream.
> + coalescedRuns next: runIndex putAll: runs startingAt: 1.
> + coalescedValued next: runIndex putAll: values startingAt: 1.
> + mustCoalesce := true].
> + lastLength := lastLength + run]
> + ifFalse:
> + [lastLength > 0
> + ifTrue:
> + [mustCoalesce
> + ifTrue:
> + [coalescedRuns nextPut: lastLength.
> + coalescedValued nextPut: lastValue].
> + runIndex := runIndex + 1].
> + lastLength := run.
> + lastValue := value]].
> + mustCoalesce
> + ifTrue:
> + [lastLength > 0
> + ifTrue:
> + [coalescedRuns nextPut: lastLength.
> + coalescedValued nextPut: lastValue].
> + self setRuns: coalescedRuns contents setValues: coalescedValued
> contents]!
> - "Try to combine adjacent runs"
> - | ind |
> - ind := 2.
> - [ind > values size] whileFalse: [
> - (values at: ind-1) = (values at: ind)
> - ifFalse: [ind := ind + 1]
> - ifTrue: ["two are the same, combine them"
> - values := values copyReplaceFrom: ind to: ind with: #().
> - runs at: ind-1 put: (runs at: ind-1) + (runs at: ind).
> - runs := runs copyReplaceFrom: ind to: ind with: #().
> - "self error: 'needed to combine runs' "]].
> - !
>
> Item was added:
> + ----- Method: RunArray>>collect: (in category 'enumerating') -----
> + collect: aBlock
> + "Beware, the block will be evaluated only once per group of values."
> + ^(self class runs: runs copy values: (values collect: aBlock))
> coalesce!
>
> Item was added:
> + ----- Method: RunArray>>copyUpThrough: (in category 'copying') -----
> + copyUpThrough: value
> + "Optimized"
> +
> + | newSize newValues newRuns |
> + newSize := values indexOf: value startingAt: 1.
> + newSize = 0 ifTrue: [^self copy].
> + newRuns := runs copyFrom: 1 to: newSize.
> + newRuns at: newSize put: 1.
> + newValues := values copyFrom: 1 to: newSize.
> + ^ self class
> + runs: newRuns
> + values: newValues!
>
> Item was added:
> + ----- Method: RunArray>>copyUpTo: (in category 'copying') -----
> + copyUpTo: anElement
> + "Optimized"
> +
> + | newValues |
> + newValues := values copyUpTo: anElement.
> + ^ self class
> + runs: (runs copyFrom: 1 to: newValues size)
> + values: newValues!
>
> Item was added:
> + ----- Method: RunArray>>copyUpToLast: (in category 'copying') -----
> + copyUpToLast: value
> + "Optimized"
> +
> + | newSize run newRuns newValues |
> + newSize := values lastIndexOf: value startingAt: values size.
> + newSize = 0 ifTrue: [^self copy].
> + run := runs at: newSize.
> + run > 1
> + ifTrue:
> + [newRuns := runs copyFrom: 1 to: newSize.
> + newRuns at: newSize put: run - 1]
> + ifFalse:
> + [newSize := newSize - 1.
> + newRuns := runs copyFrom: 1 to: newSize].
> + newValues := values copyFrom: 1 to: newSize.
> + ^ self class
> + runs: newRuns
> + values: newValues!
>
> Item was added:
> + ----- Method: RunArray>>count: (in category 'enumerating') -----
> + count: aBlock
> + "Beware, the block will be evaluated only once per group of values."
> + | count |
> + count := 0.
> + self runsAndValuesDo: [:run :value |
> + (aBlock value: value)
> + ifTrue:
> + [count := count + run]].
> + ^count!
>
> Item was added:
> + ----- Method: RunArray>>detect:ifNone: (in category 'enumerating') -----
> + detect: aBlock ifNone: exceptionBlock
> + "Only evaluate once per run"
> +
> + ^values detect: aBlock ifNone: exceptionBlock !
>
> Item was added:
> + ----- Method: RunArray>>detectMax: (in category 'enumerating') -----
> + detectMax: aBlock
> + "Only evaluate once per run"
> +
> + ^values detectMax: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>detectMin: (in category 'enumerating') -----
> + detectMin: aBlock
> + "Only evaluate once per run"
> +
> + ^values detectMin: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>detectSum: (in category 'enumerating') -----
> + detectSum: aBlock
> + "Only loop on runs"
> + | sum |
> + sum := 0.
> + self runsAndValuesDo: [:run :value |
> + sum := (aBlock value: value) * run + sum].
> + ^ sum!
>
> Item was added:
> + ----- Method: RunArray>>do: (in category 'enumerating') -----
> + do: aBlock
> + "This is refined for speed"
> +
> + 1 to: runs size do: [:i |
> + | r v |
> + v := values at: i.
> + r := runs at: i.
> + [( r := r - 1) >= 0]
> + whileTrue: [aBlock value: v]].!
>
> Item was added:
> + ----- Method: RunArray>>fillFrom:with: (in category 'private') -----
> + fillFrom: aCollection with: aBlock
> + "Evaluate aBlock with each of aCollections's elements as the argument.
> + Collect the resulting values into self. Answer self."
> +
> + | newRuns newValues lastLength lastValue |
> + newRuns := (Array new: aCollection size) writeStream.
> + newValues := (Array new: aCollection size) writeStream.
> + lastLength := 0.
> + lastValue := Object new.
> + lastIndex := nil. "flush access cache"
> + aCollection do: [:each |
> + | value |
> + value := aBlock value: each.
> + lastValue = value
> + ifTrue: [lastLength := lastLength + 1]
> + ifFalse:
> + [lastLength > 0
> + ifTrue:
> + [newRuns nextPut: lastLength.
> + newValues nextPut: lastValue].
> + lastLength := 1.
> + lastValue := value]].
> + lastLength > 0
> + ifTrue:
> + [newRuns nextPut: lastLength.
> + newValues nextPut: lastValue].
> + self setRuns: newRuns contents setValues: newValues contents!
>
> Item was added:
> + ----- Method: RunArray>>findFirst: (in category 'enumerating') -----
> + findFirst: aBlock
> + | index |
> + index := 1.
> + self runsAndValuesDo: [ :run :value |
> + (aBlock value: value) ifTrue: [^index].
> + index := index + run].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>findLast: (in category 'enumerating') -----
> + findLast: aBlock
> + | index |
> + index := values size + 1.
> + [(index := index - 1) >= 1] whileTrue:
> + [(aBlock value: (values at: index)) ifTrue: [^(1 to: index) detectSum:
> [:i | runs at: i]]].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>includes: (in category 'testing') -----
> + includes: anObject
> + "Answer whether anObject is one of the receiver's elements."
> +
> + ^values includes: anObject!
>
> Item was added:
> + ----- Method: RunArray>>indexOf:startingAt: (in category 'accessing')
> -----
> + indexOf: anElement startingAt: start
> + "Answer the index of the first occurence of anElement after start
> + within the receiver. If the receiver does not contain anElement,
> + answer 0."
> +
> + | index |
> + index := 1.
> + self runsAndValuesDo: [ :run :value |
> + (index >= start and: [value = anElement]) ifTrue: [^index].
> + index := index + run].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>indexOfAnyOf:startingAt: (in category
> 'accessing') -----
> + indexOfAnyOf: aCollection startingAt: start
> + "Answer the index of the first occurence of any element included in
> aCollection after start within the receiver.
> + If the receiver does not contain anElement, answer zero, which is an
> invalid index."
> +
> + | index |
> + index := 1.
> + self runsAndValuesDo: [ :run :value |
> + (index >= start and: [aCollection includes: value]) ifTrue: [^index].
> + index := index + run].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>isSorted (in category 'testing') -----
> + isSorted
> + ^values isSorted!
>
> Item was added:
> + ----- Method: RunArray>>isSortedBy: (in category 'testing') -----
> + isSortedBy: aBlock
> + ^values isSortedBy: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>lastIndexOf:startingAt: (in category
> 'accessing') -----
> + lastIndexOf: anElement startingAt: lastIndex
> + "Answer the index of the last occurence of anElement within the
> + receiver. If the receiver does not contain anElement, answer 0."
> +
> + | lastValueIndex |
> + lastValueIndex := values lastIndexOf: anElement startingAt: values
> size.
> + [lastValueIndex > 0] whileTrue:
> + [| i index |
> + i := index := 0.
> + [index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
> + whileTrue: [index := index + (runs at: i)].
> + index <= lastIndex ifTrue: [^index].
> + index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
> + lastValueIndex := values lastIndexOf: anElement startingAt:
> lastValueIndex - 1].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>lastIndexOfAnyOf:startingAt: (in category
> 'accessing') -----
> + lastIndexOfAnyOf: aCollection startingAt: lastIndex
> + "Answer the index of the last occurence of any element of aCollection
> + within the receiver. If the receiver does not contain any of those
> + elements, answer 0"
> +
> + | lastValueIndex |
> + lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt:
> values size.
> + [lastValueIndex > 0] whileTrue:
> + [| i index |
> + i := index := 0.
> + [index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
> + whileTrue: [index := index + (runs at: i)].
> + index <= lastIndex ifTrue: [^index].
> + index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
> + lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt:
> lastValueIndex - 1].
> + ^0!
>
> Item was removed:
> - ----- Method: RunArray>>mapValues: (in category 'private') -----
> - mapValues: mapBlock
> - "NOTE: only meaningful to an entire set of runs"
> -
> - values := values collect: mapBlock!
>
> Item was added:
> + ----- Method: RunArray>>noneSatisfy: (in category 'enumerating') -----
> + noneSatisfy: aBlock
> + "Only evaluate once per run"
> +
> + ^values noneSatisfy: aBlock!
>
> Item was changed:
> + ----- Method: RunArray>>rangeOf:startingAt: (in category 'accessing')
> -----
> - ----- Method: RunArray>>rangeOf:startingAt: (in category 'adding') -----
> rangeOf: attr startingAt: startPos
> "Answer an interval that gives the range of attr at index position
> startPos. An empty interval with start value startPos is returned when the
> attribute attr is not present at position startPos. self size > 0 is
> assumed, it is the responsibility of the caller to test for emptiness of
> self.
> Note that an attribute may span several adjancent runs. "
>
> self at: startPos
> setRunOffsetAndValue:
> [:run :offset :value |
> ^(value includes: attr)
> ifFalse: [startPos to: startPos - 1]
> ifTrue:
> [ | firstRelevantPosition lastRelevantPosition
> idxOfCandidateRun |
> lastRelevantPosition := startPos - offset + (runs
> at: run) - 1.
> firstRelevantPosition := startPos - offset.
> idxOfCandidateRun := run + 1.
> [idxOfCandidateRun <= runs size
> and: [(values at: idxOfCandidateRun)
> includes: attr]]
> whileTrue:
> [lastRelevantPosition := lastRelevantPosition
> + (runs at: idxOfCandidateRun).
> idxOfCandidateRun := idxOfCandidateRun + 1].
> idxOfCandidateRun := run - 1.
> [idxOfCandidateRun >= 1
> and: [(values at: idxOfCandidateRun)
> includes: attr]]
> whileTrue:
> [firstRelevantPosition :=
> firstRelevantPosition - (runs at: idxOfCandidateRun).
> idxOfCandidateRun := idxOfCandidateRun - 1].
>
> firstRelevantPosition to: lastRelevantPosition]
> ]!
>
> Item was added:
> + ----- Method: RunArray>>remove:ifAbsent: (in category 'removing') -----
> + remove: anObject ifAbsent: exceptionBlock
> + | index mustCoalesce run |
> + lastIndex := nil. "flush access cache"
> + index := values indexOf: anObject ifAbsent: [^exceptionBlock value].
> + (run := runs at: index) > 1
> + ifTrue: [runs at: index put: run - 1]
> + ifFalse:
> + [mustCoalesce := index > 1 and: [index < values size and: [(values
> at: index - 1) = (values at: index + 1)]].
> + runs := runs copyWithoutIndex: index.
> + values := values copyWithoutIndex: index.
> + mustCoalesce
> + ifTrue:
> + [runs at: index - 1 put: (runs at: index - 1) + (runs at: index).
> + runs := runs copyWithoutIndex: index.
> + values := values copyWithoutIndex: index]].
> + ^anObject!
>
> Item was added:
> + ----- Method: RunArray>>removeAll (in category 'removing') -----
> + removeAll
> + lastIndex := nil. "flush access cache"
> + runs := Array new.
> + values := Array new!
>
> Item was added:
> + ----- Method: RunArray>>replace: (in category 'enumerating') -----
> + replace: aBlock
> + "destructively replace the values in this RunArray with the ones
> transformed by aBlock."
> + lastIndex := nil. "flush access cache"
> + values := values replace: aBlock.
> + self coalesce!
>
> Item was added:
> + ----- Method: RunArray>>reverseDo: (in category 'enumerating') -----
> + reverseDo: aBlock
> + "This is refined for speed"
> +
> + | i |
> + i := runs size.
> + [i > 0]
> + whileTrue:
> + [ | r v |
> + v := values at: i.
> + r := runs at: i.
> + i := i - 1.
> + [( r := r - 1) >= 0]
> + whileTrue: [aBlock value: v]].!
>
> Item was added:
> + ----- Method: RunArray>>select: (in category 'enumerating') -----
> + select: aBlock
> + "Beware, the block will be evaluated only once per group of values."
> + | newRuns newValues |
> + newRuns := (Array new: runs size) writeStream.
> + newValues := (Array new: values size) writeStream.
> + self runsAndValuesDo: [:run :value |
> + (aBlock value: value)
> + ifTrue:
> + [newRuns nextPut: run.
> + newValues nextPut: value]].
> + ^(self class runs: newRuns contents values: newValues contents)
> coalesce!
>
> Item was changed:
> ----- Method: SequenceableCollection>>lastIndexOf:startingAt: (in
> category 'accessing') -----
> lastIndexOf: anElement startingAt: lastIndex
> "Answer the index of the last occurence of anElement within the
> + receiver. If the receiver does not contain anElement, answer 0."
> - receiver. If the receiver does not contain anElement, answer the
> - result of evaluating the argument, exceptionBlock."
>
> lastIndex to: 1 by: -1 do: [ :index |
> (self at: index) = anElement ifTrue: [ ^index ] ].
> ^0!
>
> Item was changed:
> ----- Method: Text>>addAttribute:from:to: (in category 'emphasis') -----
> addAttribute: att from: start to: stop
> "Set the attribute for characters in the interval start to stop."
> runs := runs copyReplaceFrom: start to: stop
> with: ((runs copyFrom: start to: stop)
> + replace:
> - mapValues:
> [:attributes | Text addAttribute: att toArray: attributes])
> !
>
> Item was changed:
> ----- Method: Text>>removeAttribute:from:to: (in category 'emphasis')
> -----
> removeAttribute: att from: start to: stop
> "Remove the attribute over the interval start to stop."
> runs := runs copyReplaceFrom: start to: stop
> with: ((runs copyFrom: start to: stop)
> + replace:
> - mapValues:
> [:attributes | attributes copyWithout: att])
> !
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html
More information about the Squeak-dev
mailing list
|