[squeak-dev] The Trunk: Collections-nice.771.mcz
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Thu Nov 30 21:01:59 UTC 2017
2017-11-30 21:57 GMT+01:00 <commits at source.squeak.org>:
> Nicolas Cellier uploaded a new version of Collections to project The Trunk:
> http://source.squeak.org/trunk/Collections-nice.771.mcz
>
> ==================== Summary ====================
>
> Name: Collections-nice.771
> Author: nice
> Time: 30 November 2017, 9:57:27.873964 pm
> UUID: d8b64711-6119-429b-b3f5-259b46ef864b
> Ancestors: Collections-nice.770
>
> Fix awfully broken CharacterSetComplement select:/reject:
>
> If we want to select:/reject:, we must not only enumerate the absent
> characters, but rather all the characters in the complement.
>
> That's way two many, thus we prefer to do it with a LazyCharacterSet,
> anything else is unfeasible (we don't even know the upper limit of the set
> of characters...).
>
> Introduce an AbstractCharacterSet superclass of all the CharacterSet
> family in order to begin factoring some behavior.
>
> TODO: we should better rename
> CharacterSet -> ByteCharacterSet
> AbstractCharacterSet -> CharacterSet.
>
> We delay this quite technical operation, because we don't want to break
> existing instances, AND we want to redirect (Byte)CharacterSet references
> to (Abstract)CharacterSet, the Abstract one becoming a factory.
>
> =============== Diff against Collections-nice.770 ===============
>
> Item was added:
> + Collection subclass: #AbstractCharacterSet
> + instanceVariableNames: ''
> + classVariableNames: ''
> + poolDictionaries: ''
> + category: 'Collections-Support'!
>
> Item was added:
> + ----- Method: AbstractCharacterSet>>byteArrayMap (in category
> 'accessing') -----
> + byteArrayMap
> + ^self subclassReponsibility!
>
> Item was added:
> + ----- Method: AbstractCharacterSet>>findFirstInByteString:startingAt:
> (in category 'enumerating') -----
> + findFirstInByteString: aByteString startingAt: startIndex
> + "Double dispatching: since we know this is a ByteString, we can
> use a superfast primitive using a ByteArray map with 0 slots for byte
> characters not included and 1 for byte characters included in the receiver."
> + ^ByteString
> + findFirstInString: aByteString
> + inSet: self byteArrayMap
> + startingAt: startIndex!
>
> Item was added:
> + ----- Method: AbstractCharacterSet>>occurrencesOf: (in category
> 'enumerating') -----
> + occurrencesOf: anObject
> + "Answer how many of the receiver's elements are equal to anObject.
> Optimized version."
> +
> + (self includes: anObject) ifTrue: [ ^1 ].
> + ^0!
>
> Item was added:
> + ----- Method: AbstractCharacterSet>>removeAll (in category 'removing')
> -----
> + removeAll
> + self becomeForward: CharacterSet new!
>
> Item was changed:
> + AbstractCharacterSet subclass: #CharacterSet
> - Collection subclass: #CharacterSet
> instanceVariableNames: 'map tally'
> classVariableNames: 'CrLf NonSeparators Separators'
> poolDictionaries: ''
> category: 'Collections-Support'!
>
> !CharacterSet commentStamp: '<historical>' prior: 0!
> A set of characters. Lookups for inclusion are very fast.!
>
> Item was changed:
> + ----- Method: CharacterSet>>= (in category 'comparing') -----
> - ----- Method: CharacterSet>>= (in category 'comparison') -----
> = anObject
>
> self species == anObject species ifFalse: [ ^false ].
> anObject size = tally ifFalse: [ ^false ].
> ^self byteArrayMap = anObject byteArrayMap!
>
> Item was changed:
> + ----- Method: CharacterSet>>add: (in category 'adding') -----
> - ----- Method: CharacterSet>>add: (in category 'collection ops') -----
> add: aCharacter
> "I automatically become a WideCharacterSet if you add a wide
> character to myself"
>
> | index |
> (index := aCharacter asInteger + 1) <= 256 ifFalse: [
> | wide |
> wide := WideCharacterSet new.
> wide addAll: self.
> wide add: aCharacter.
> self becomeForward: wide.
> ^aCharacter ].
> (map at: index) = 1 ifFalse: [
> map at: index put: 1.
> tally := tally + 1 ].
> ^aCharacter!
>
> Item was changed:
> + ----- Method: CharacterSet>>do: (in category 'enumerating') -----
> - ----- Method: CharacterSet>>do: (in category 'collection ops') -----
> do: aBlock
> "evaluate aBlock with each character in the set"
>
> | index |
> tally >= 128 ifTrue: [ "dense"
> index := 0.
> [ (index := index + 1) <= 256 ] whileTrue: [
> (map at: index) = 1 ifTrue: [
> aBlock value: (Character value: index - 1)
> ] ].
> ^self ].
> "sparse"
> index := 0.
> [ (index := map indexOf: 1 startingAt: index + 1) = 0 ]
> whileFalse: [
> aBlock value: (Character value: index - 1) ].
> !
>
> Item was changed:
> + ----- Method: CharacterSet>>findFirstInByteString:startingAt: (in
> category 'enumerating') -----
> - ----- Method: CharacterSet>>findFirstInByteString:startingAt: (in
> category 'collection ops') -----
> findFirstInByteString: aByteString startingAt: startIndex
> "Double dispatching: since we know this is a ByteString, we can
> use a superfast primitive using a ByteArray map with 0 slots for byte
> characters not included and 1 for byte characters included in the receiver."
> ^ByteString
> findFirstInString: aByteString
> inSet: self byteArrayMap
> startingAt: startIndex!
>
> Item was changed:
> + ----- Method: CharacterSet>>hash (in category 'comparing') -----
> - ----- Method: CharacterSet>>hash (in category 'comparison') -----
> hash
> ^self byteArrayMap hash!
>
> Item was changed:
> + ----- Method: CharacterSet>>includes: (in category 'testing') -----
> - ----- Method: CharacterSet>>includes: (in category 'collection ops')
> -----
> includes: anObject
>
> | index |
> anObject isCharacter ifFalse: [ ^false ].
> (index := anObject asInteger + 1) > 256 ifTrue: [ ^false ].
> ^(map at: index) > 0!
>
> Item was removed:
> - ----- Method: CharacterSet>>occurrencesOf: (in category 'enumerating')
> -----
> - occurrencesOf: anObject
> - "Answer how many of the receiver's elements are equal to anObject.
> Optimized version."
> -
> - (self includes: anObject) ifTrue: [ ^1 ].
> - ^0!
>
> Item was changed:
> + ----- Method: CharacterSet>>remove: (in category 'removing') -----
> - ----- Method: CharacterSet>>remove: (in category 'collection ops') -----
> remove: aCharacter
>
> ^self remove: aCharacter ifAbsent: aCharacter!
>
> Item was changed:
> + ----- Method: CharacterSet>>remove:ifAbsent: (in category 'removing')
> -----
> - ----- Method: CharacterSet>>remove:ifAbsent: (in category 'collection
> ops') -----
> remove: aCharacter ifAbsent: aBlock
>
> | index |
> (index := aCharacter asciiValue + 1) <= 256 ifFalse: [ ^aBlock
> value ].
> (map at: index) = 0 ifTrue: [ ^aBlock value ].
> map at: index put: 0.
> tally := tally - 1.
> ^aCharacter!
>
> Item was changed:
> + ----- Method: CharacterSet>>size (in category 'accessing') -----
> - ----- Method: CharacterSet>>size (in category 'collection ops') -----
> size
>
> ^tally!
>
> Item was changed:
> + ----- Method: CharacterSet>>species (in category 'comparing') -----
> - ----- Method: CharacterSet>>species (in category 'comparison') -----
> species
> ^CharacterSet!
>
> Item was changed:
> + AbstractCharacterSet subclass: #CharacterSetComplement
> - Collection subclass: #CharacterSetComplement
> instanceVariableNames: 'absent byteArrayMapCache'
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Collections-Support'!
>
> !CharacterSetComplement commentStamp: 'nice 8/31/2008 14:53' prior: 0!
> CharacterSetComplement is a space efficient implementation of
> (CharacterSet complement) taking care of WideCharacter (code > 255)
>
> However, it will maintain a byteArrayMap for character <= 255 in a cache
> keeping
>
> instance variables:
> absent <CharacterSet> contains character that are not in the set
> (i.e. my complement)
> byteArrayMapCache <ByteArray | nil> cache this information because
> it has to be used in tight loops where efficiency matters!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>add: (in category 'adding') -----
> - ----- Method: CharacterSetComplement>>add: (in category 'collection
> ops') -----
> add: aCharacter
> "a character is present if not absent, so adding a character is
> removing it from the absent"
>
> (absent includes: aCharacter)
> ifTrue:
> [byteArrayMapCache := nil.
> absent remove: aCharacter].
> ^ aCharacter!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>do: (in category 'enumerating')
> -----
> - ----- Method: CharacterSetComplement>>do: (in category 'collection ops')
> -----
> do: aBlock
> "evaluate aBlock with each character in the set.
> don't do it, there are too many..."
>
> self shouldNotImplement!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>findFirstInByteString:startingAt:
> (in category 'enumerating') -----
> - ----- Method: CharacterSetComplement>>findFirstInByteString:startingAt:
> (in category 'collection ops') -----
> findFirstInByteString: aByteString startingAt: startIndex
> "Double dispatching: since we know this is a ByteString, we can
> use a superfast primitive using a ByteArray map with 0 slots for byte
> characters not included and 1 for byte characters included in the receiver."
> ^ByteString
> findFirstInString: aByteString
> inSet: self byteArrayMap
> startingAt: startIndex!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>includes: (in category 'testing')
> -----
> - ----- Method: CharacterSetComplement>>includes: (in category
> 'collection ops') -----
> includes: anObject
>
> anObject isCharacter ifFalse: [ ^false ].
> (absent includes: anObject) ifTrue: [ ^false ].
> ^true!
>
> Item was removed:
> - ----- Method: CharacterSetComplement>>occurrencesOf: (in category
> 'enumerating') -----
> - occurrencesOf: anObject
> - "Answer how many of the receiver's elements are equal to anObject.
> Optimized version."
> -
> - (self includes: anObject) ifTrue: [ ^1 ].
> - ^0!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>reject: (in category
> 'enumerating') -----
> - ----- Method: CharacterSetComplement>>reject: (in category 'collection
> ops') -----
> reject: aBlock
> + ^LazyCharacterSet including: [:c | (absent includes: c) not and:
> [(aBlock value: c) not]]!
> - "Implementation note: rejecting present is selecting absent"
> -
> - ^(absent select: aBlock) complement!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>remove: (in category 'removing')
> -----
> - ----- Method: CharacterSetComplement>>remove: (in category 'collection
> ops') -----
> remove: aCharacter
> "This means aCharacter is now absent from myself.
> It must be added to my absent."
>
> byteArrayMapCache := nil.
> ^absent add: aCharacter!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>remove:ifAbsent: (in category
> 'removing') -----
> - ----- Method: CharacterSetComplement>>remove:ifAbsent: (in category
> 'collection ops') -----
> remove: aCharacter ifAbsent: aBlock
> (self includes: aCharacter) ifFalse: [^aBlock value].
> ^self remove: aCharacter!
>
> Item was removed:
> - ----- Method: CharacterSetComplement>>removeAll (in category
> 'collection ops') -----
> - removeAll
> -
> - self becomeForward: CharacterSet new!
>
> Item was changed:
> + ----- Method: CharacterSetComplement>>select: (in category
> 'enumerating') -----
> - ----- Method: CharacterSetComplement>>select: (in category 'collection
> ops') -----
> select: aBlock
> + ^LazyCharacterSet including: [:c | (absent includes: c) not and:
> [aBlock value: c]]!
> - "Implementation note: selecting present is rejecting absent"
> -
> - ^(absent reject: aBlock) complement!
>
> Item was removed:
> - ----- Method: CharacterSetComplement>>size (in category 'collection
> ops') -----
> - size
> - "Is this 2**32-absent size ?"
> -
> - ^self shouldNotImplement!
>
> Item was added:
> + ----- Method: Interval>>copyFrom:to: (in category 'copying') -----
> + copyFrom: startIndex to: stopIndex
> + stopIndex < startIndex ifTrue: [^self copyEmpty].
> + ^(self at: startIndex) to: (self at: stopIndex) by: step!
>
>
Arghh, sorry, the Interval>>copyFrom:to: was a totally unrelated experiment
I think it is a good change, but it might break some tests and should have
better gone in its own change set/commit.
By now I let it there, but if any one bark, I will revert.
Item was added:
> + AbstractCharacterSet subclass: #LazyCharacterSet
> + instanceVariableNames: 'block byteArrayMapCache'
> + classVariableNames: ''
> + poolDictionaries: ''
> + category: 'Collections-Support'!
> +
> + !LazyCharacterSet commentStamp: 'nice 11/30/2017 21:40' prior: 0!
> + A LazyCharacterSet is a kind of CharacterSet which does not know in
> advance which Character it contains or not.
> + If will lazily evaluate a block on demand if ever one ask whether it
> includes: a character.
> + It is not feasible to enumerate a LazyCharacterSet, because there are
> way too many characters.
> +
> + Instance Variables
> + block: <BlockContext | Symbol>
> + byteArrayMapCache: <ByteArray | nil>
> +
> + block
> + - a valuable, answering either true or false when sent the message
> value: - true means that this set includes the character passed as value:
> argument.
> +
> + byteArrayMapCache
> + - a cache holding 0 or 1 for the first 256 character codes - 0
> meaning not included, 1 included. This is used in some priitives
> + !
>
> Item was added:
> + ----- Method: LazyCharacterSet class>>including: (in category 'instance
> creation') -----
> + including: aBlock
> + "Create the set of Character for which aBlock evaluates to true"
> + ^self class new block: aBlock!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>add: (in category 'adding') -----
> + add: aCharacter
> + self block: [:c | c = aCharacter or: [block value: c]].
> + ^aCharacter!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>addAll: (in category 'adding') -----
> + addAll: aCollection
> + self block: [:c | (aCollection includes: c) or: [block value: c]].
> + ^aCollection!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>block (in category 'accessing') -----
> + block
> + ^block!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>block: (in category 'accessing') -----
> + block: aValuable
> + "Set the block used to determine if I include a Character or not.
> + aValuable is an object that shoud answer true or false when sent
> value:"
> +
> + byteArrayMapCache := nil.
> + ^block := aValuable!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>byteArrayMap (in category 'accessing')
> -----
> + byteArrayMap
> + "return a ByteArray mapping each ascii value to a 1 if that ascii
> value is in the set, and a 0 if it isn't. Intended for use by primitives
> only"
> +
> + ^byteArrayMapCache ifNil: [byteArrayMapCache := (0 to: 255)
> collect: [:i | self includes: (Character value: i)]]!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>complement (in category 'converting')
> -----
> + complement
> + ^self class including: [:char | (block value: char) not]!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>do: (in category 'enumerating') -----
> + do: aBlock
> + "evaluate aBlock with each character in the set.
> + don't do it, there are too many loop..."
> +
> + self shouldNotImplement!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>includes: (in category 'testing') -----
> + includes: aCharacter
> + ^block value: aCharacter!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>reject: (in category 'enumerating') -----
> + reject: aBlock
> + ^self class including: [:char | (aBlock value: char) not and:
> [block value: char]]!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>remove: (in category 'removing') -----
> + remove: aCharacter
> + self block: [:c | (c = aCharacter) not and: [block value: c]].
> + ^aCharacter!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>remove:ifAbsent: (in category
> 'removing') -----
> + remove: aCharacter ifAbsent: aBlock
> + (self includes: aCharacter) ifFalse: [^aBlock value].
> + ^self remove: aCharacter!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>removeAll: (in category 'removing') -----
> + removeAll: aCollection
> + self block: [:c | (aCollection include: c) not and: [block value:
> c]].
> + ^aCollection!
>
> Item was added:
> + ----- Method: LazyCharacterSet>>select: (in category 'enumerating') -----
> + select: aBlock
> + ^self class including: [:char | (block value: char) and: [aBlock
> value: char]]!
>
> Item was changed:
> + AbstractCharacterSet subclass: #WideCharacterSet
> - Collection subclass: #WideCharacterSet
> instanceVariableNames: 'map byteArrayMap bitsetCapacity
> highBitsShift lowBitsMask'
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Collections-Support'!
>
> !WideCharacterSet commentStamp: 'nice 12/10/2009 19:17' prior: 0!
> WideCharacterSet is used to store a Set of WideCharacter with fast
> access and inclusion test.
>
> Implementation should be efficient in memory if sets are sufficently
> sparse.
>
> Wide Characters are at most 32bits.
> We split them into 16 highBits and 16 lowBits.
>
> map is a dictionary key: 16 highBits value: map of 16 lowBits.
>
> Maps of lowBits are stored as arrays of bits in a ByteArray.
> If a bit is set to 1, this indicate that corresponding character is
> present.
> 8192 bytes are necessary in each lowmap.
> Empty lowmap are removed from the map Dictionary.
>
> A byteArrayMap is maintained in parallel with map for fast handling of
> ByteString.
> (byteArrayMap at: i+1) = 0 means that character of asciiValue i is
> absent, = 1 means present.!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>add: (in category 'adding') -----
> - ----- Method: WideCharacterSet>>add: (in category 'collection ops') -----
> add: aCharacter
>
> | value highBits lowBits |
> (value := aCharacter asInteger) < 256 ifTrue: [
> byteArrayMap at: value + 1 put: 1 ].
> highBits := value bitShift: highBitsShift.
> lowBits := value bitAnd: lowBitsMask.
> (map at: highBits ifAbsentPut: [ Bitset new: bitsetCapacity ])
> setBitAt: lowBits.
> ^aCharacter!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>do: (in category 'enumerating') -----
> - ----- Method: WideCharacterSet>>do: (in category 'collection ops') -----
> do: aBlock
>
> map keysAndValuesDo: [ :index :bitset |
> | highBits |
> highBits := index * bitsetCapacity.
> bitset do: [ :lowBits |
> aBlock value: (Character value: highBits +
> lowBits) ] ]!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>findFirstInByteString:startingAt: (in
> category 'enumerating') -----
> - ----- Method: WideCharacterSet>>findFirstInByteString:startingAt: (in
> category 'collection ops') -----
> findFirstInByteString: aByteString startingAt: startIndex
> "Double dispatching: since we know this is a ByteString, we can
> use a superfast primitive using a ByteArray map with 0 slots for byte
> characters not included and 1 for byte characters included in the receiver."
>
> ^ByteString
> findFirstInString: aByteString
> inSet: byteArrayMap
> startingAt: startIndex!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>includes: (in category 'testing') -----
> - ----- Method: WideCharacterSet>>includes: (in category 'collection ops')
> -----
> includes: anObject
>
> | value |
> anObject isCharacter ifFalse: [ ^false ].
> (value := anObject asInteger) < 256 ifTrue: [
> ^(byteArrayMap at: value + 1) ~= 0 ].
> ^((map at: (value bitShift: highBitsShift) ifAbsent: nil) ifNil: [
> ^false ])
> includes: (value bitAnd: lowBitsMask)!
>
> Item was removed:
> - ----- Method: WideCharacterSet>>occurrencesOf: (in category
> 'enumerating') -----
> - occurrencesOf: anObject
> - "Answer how many of the receiver's elements are equal to anObject.
> Optimized version."
> -
> - (self includes: anObject) ifTrue: [ ^1 ].
> - ^0!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>remove: (in category 'removing') -----
> - ----- Method: WideCharacterSet>>remove: (in category 'collection ops')
> -----
> remove: aCharacter
> "Don't signal an error when aCharacter is not present."
>
> ^self remove: aCharacter ifAbsent: aCharacter!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>remove:ifAbsent: (in category
> 'removing') -----
> - ----- Method: WideCharacterSet>>remove:ifAbsent: (in category
> 'collection ops') -----
> remove: aCharacter ifAbsent: aBlock
>
> | value highBits lowBits bitset |
> (value := aCharacter asInteger) < 256 ifTrue: [
> (byteArrayMap at: value + 1) = 0 ifTrue: [ ^aBlock value ].
> byteArrayMap at: value + 1 put: 0 ].
> highBits := value bitShift: highBitsShift.
> lowBits := value bitAnd: lowBitsMask.
> bitset := (map at: highBits ifAbsent: nil) ifNil: [ ^aBlock value
> ].
> ((bitset clearBitAt: lowBits) and: [ bitset size = 0 ]) ifTrue: [
> map removeKey: highBits ].
> ^aCharacter!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>removeAll (in category 'removing') -----
> - ----- Method: WideCharacterSet>>removeAll (in category 'collection ops')
> -----
> removeAll
>
> map isEmpty ifTrue: [ ^self ].
> map removeAll.
> byteArrayMap atAllPut: 0!
>
> Item was changed:
> + ----- Method: WideCharacterSet>>size (in category 'accessing') -----
> - ----- Method: WideCharacterSet>>size (in category 'collection ops') -----
> size
>
> ^map detectSum: [ :each | each size ]!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20171130/e326641d/attachment-0001.html>
More information about the Squeak-dev
mailing list
|