[squeak-dev] The Trunk: Regex-Core-ul.48.mcz

commits at source.squeak.org commits at source.squeak.org
Sat Mar 26 23:39:23 UTC 2016


Levente Uzonyi uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ul.48.mcz

==================== Summary ====================

Name: Regex-Core-ul.48
Author: ul
Time: 27 March 2016, 12:29:39.025303 am
UUID: 911704bb-a07f-4ac8-bed3-de8f349ac9e5
Ancestors: Regex-Core-ul.47

- use double dispatch in #matchAgainst: of the subclasses of RxmLink. This allowed us to get rid of #currentState, #markerPositionAt:add: and #restoreState: from RxMatcher
- do not add both lowercase and uppercase characters to the set in RxsCharSet >> #enumerableSetIgnoringCase:, because that stops a few possible optimizations to work. The missing characters are added in #enumerablePartPredicateIgnoringCase: and RxMatchOptimizer's corresponding method. This change also let us have simpler #enumerateTo: methods

=============== Diff against Regex-Core-ul.47 ===============

Item was removed:
- ----- Method: RxMatcher>>currentState (in category 'privileged') -----
- currentState
- 	"Answer an opaque object that can later be used to restore the matcher's state (for backtracking)."
- 
- 	^stream position!

Item was removed:
- ----- Method: RxMatcher>>markerPositionAt:add: (in category 'privileged') -----
- markerPositionAt: index add: position
- 	"Remember position of another instance of the given marker."
- 
- 	index <= 2 ifTrue: [
- 		markerPositions at: index put: position.
- 		^self ].
- 	(markerPositions at: index) addLast: position!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstLookahead:nextLink: (in category 'matching') -----
+ matchAgainstLookahead: lookahead nextLink: anRmxLink
+ 
+ 	| position result |
+ 	position := stream position.
+ 	result := lookahead matchAgainst: self.
+ 	stream position: position.
+ 	result ifTrue: [ ^false ].
+ 	^anRmxLink matchAgainst: self!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstMarkerAt:nextLink: (in category 'matching') -----
+ matchAgainstMarkerAt: index nextLink: anRmxLink
+ 
+ 	| position |
+ 	position := stream position.
+ 	(anRmxLink matchAgainst: self) ifFalse: [ ^false ].
+ 	index <= 2 
+ 		ifTrue: [ markerPositions at: index put: position ]
+ 		ifFalse: [ (markerPositions at: index) addLast: position ].
+ 	^true!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstPredicate:nextLink: (in category 'matching') -----
+ matchAgainstPredicate: aBlock nextLink: anRmxLink
+ 
+ 	| next position |
+ 	next := stream next ifNil: [ ^false ].
+ 	position := stream position - 1.
+ 	(aBlock value: next) ifTrue: [
+ 		(anRmxLink matchAgainst: self) ifTrue: [ ^true ] ].
+ 	stream position: position.
+ 	^false!

Item was added:
+ ----- Method: RxMatcher>>matchCaseInsensitiveAgainstStream:nextLink: (in category 'matching') -----
+ matchCaseInsensitiveAgainstStream: aStream nextLink: anRmxLink
+ 
+ 	| next nextSample position |
+ 	position := stream position.
+ 	"The order of the stream reads is important. When aStream runs out of characters, we must not consume any character from stream, or else the next position will be skipped in stream."
+ 	[ (nextSample := aStream next) == nil or: [ (next := stream next) == nil ] ] whileFalse: [
+ 		(nextSample sameAs: next) ifFalse: [
+ 			stream position: position.
+ 			^false ] ].
+ 	(nextSample == nil and: [ anRmxLink matchAgainst: self ]) ifTrue: [ ^true ].
+ 	stream position: position.
+ 	^false!

Item was added:
+ ----- Method: RxMatcher>>matchCaseSensitiveAgainstStream:nextLink: (in category 'matching') -----
+ matchCaseSensitiveAgainstStream: aStream nextLink: anRmxLink
+ 
+ 	| next nextSample position |
+ 	position := stream position.
+ 	"The order of the stream reads is important. When aStream runs out of characters, we must not consume any character from stream, or else the next position will be skipped in stream."
+ 	[ (nextSample := aStream next) == nil or: [ (next := stream next) == nil ] ] whileFalse: [
+ 		nextSample == next ifFalse: [
+ 			stream position: position.
+ 			^false ] ].
+ 	(nextSample == nil and: [ anRmxLink matchAgainst: self ]) ifTrue: [ ^true ].
+ 	stream position: position.
+ 	^false!

Item was removed:
- ----- Method: RxMatcher>>restoreState: (in category 'privileged') -----
- restoreState: streamPosition
- 
- 	stream position: streamPosition!

Item was changed:
  ----- Method: RxmLookahead>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  	"Match if the predicate block evaluates to true when given the
  	current stream character as the argument."
  
+ 	^aMatcher matchAgainstLookahead: lookahead nextLink: next!
- 	| original result |
- 	original := aMatcher currentState.
- 	result := lookahead matchAgainst: aMatcher.
- 	aMatcher restoreState: original.
- 	^ result not 
- 		and: [ next matchAgainst: aMatcher ]!

Item was changed:
  ----- Method: RxmMarker>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  	"If the rest of the link chain matches successfully, report the
  	position of the stream *before* the match started to the matcher."
  
+ 	^aMatcher matchAgainstMarkerAt: index nextLink: next!
- 	| startPosition |
- 	startPosition := aMatcher position.
- 	(next matchAgainst: aMatcher) ifFalse: [ ^false ].
- 	aMatcher markerPositionAt: index add: startPosition.
- 	^true!

Item was changed:
  ----- Method: RxmPredicate>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  	"Match if the predicate block evaluates to true when given the
  	current stream character as the argument."
  
+ 	^aMatcher matchAgainstPredicate: predicate nextLink: next!
- 	| nextCharacter originalState |
- 	originalState := aMatcher currentState.
- 	nextCharacter := aMatcher next ifNil: [
- 		aMatcher restoreState: originalState.
- 		^false ].
- 	(predicate value: nextCharacter) ifTrue: [
- 		(next matchAgainst: aMatcher) ifTrue: [ ^true ] ].
- 	aMatcher restoreState: originalState.
- 	^false
- !

Item was changed:
  ----- Method: RxmSpecial>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  	"Match without consuming any input, if the matcher is
  	in appropriate state."
  
+ 	(aMatcher perform: matchSelector) ifFalse: [ ^false ].
+ 	^next matchAgainst: aMatcher!
- 	^(aMatcher perform: matchSelector)
- 		and: [next matchAgainst: aMatcher]!

Item was changed:
  ----- Method: RxmSubstring>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  	"Match if my sample stream is exactly the current prefix
  	of the matcher stream's contents."
  
- 	| nextSample nextFromMatcher originalState |
- 	originalState := aMatcher currentState.
  	sampleStream reset.
+ 	ignoreCase ifFalse: [ ^aMatcher matchCaseSensitiveAgainstStream: sampleStream nextLink: next ].
+ 	^aMatcher matchCaseInsensitiveAgainstStream: sampleStream nextLink: next!
- 	ignoreCase 
- 		ifFalse: [
- 			[ (nextSample := sampleStream next) == nil or: [ (nextFromMatcher := aMatcher next) == nil ] ] whileFalse: [
- 				nextSample == nextFromMatcher ifFalse: [
- 					aMatcher restoreState: originalState.
- 					^false ] ] ]
- 		ifTrue: [ 
- 			[ (nextSample := sampleStream next) == nil or: [ (nextFromMatcher := aMatcher next) == nil ] ] whileFalse: [
- 				(nextSample sameAs: nextFromMatcher) ifFalse: [
- 					aMatcher restoreState: originalState.
- 					^false ] ] ].
- 	(nextSample == nil and: [ next matchAgainst: aMatcher ]) ifTrue: [ ^true ].
- 	aMatcher restoreState: originalState.
- 	^false!

Item was changed:
  ----- Method: RxsCharSet>>enumerablePartPredicateIgnoringCase: (in category 'privileged') -----
+ enumerablePartPredicateIgnoringCase: ignoreCase
- enumerablePartPredicateIgnoringCase: aBoolean
  
+ 	| set p |
+ 	set := (self enumerableSetIgnoringCase: ignoreCase) ifNil: [ ^nil ].
- 	| set |
- 	set := (self enumerableSetIgnoringCase: aBoolean) ifNil: [ ^nil ].
  	set size = 1 ifTrue: [
+ 		| char |
+ 		char := set anyOne.
+ 		ignoreCase ifTrue: [
+ 			| lowercaseChar |
+ 			lowercaseChar := char asLowercase.
+ 			char := char asUppercase.
+ 			char == lowercaseChar ifFalse: [ 
+ 				negated ifTrue: [ 
+ 					^[ :character | (character == char or: [ character == lowercaseChar ]) not ] ].
+ 				^[ :character | character == char or: [ character == lowercaseChar ] ] ] ].
+ 		negated ifTrue: [ ^[ :character | character ~~ char ] ].
+ 		^[ :character | character == char ] ].
+ 	ignoreCase ifTrue: [
+ 		set copy do: [ :each |
+ 			| char |
+ 			(char := each asUppercase) == each
+ 				ifFalse: [ set add: char ]
+ 				ifTrue: [ 
+ 					(char := each asLowercase) == each ifFalse: [
+ 						set add: char ] ] ] ].
+ 	set size < 10 ifTrue: [ "10 is an empirical value"
+ 		p := set asArray.
+ 		negated ifTrue: [ ^[ :character | (p instVarsInclude: character) not ] ].
+ 		^[ :character | p instVarsInclude: character ] ].
+ 	negated ifTrue: [ ^[ :character | (set includes: character) not ] ].
+ 	^[ :character | set includes: character ]!
- 		| p |
- 		p := set anyOne.
- 		negated ifTrue: [ ^[ :character | character ~~ p ] ].
- 		^[ :character | character == p ] ].
- 	negated ifTrue: [ ^[ :char | (set includes: char) not ] ].
- 	^[ :char | set includes: char ]!

Item was changed:
  ----- Method: RxsCharSet>>enumerableSetIgnoringCase: (in category 'privileged') -----
  enumerableSetIgnoringCase: aBoolean
+ 	"Answer a collection of characters that make up the portion of me that can be enumerated, or nil if there are no such characters. The case check is only used to determine the type of set to be used. The returned set won't contain characters of both cases, because this way the senders of this method can create more efficient checks."
- 	"Answer a collection of characters that make up the portion of me that can be enumerated, or nil if there are no such characters."
  
  	| highestCharacterCode set |
  	highestCharacterCode := elements inject: -1 into: [ :max :each |
  		max := (each maximumCharacterCodeIgnoringCase: aBoolean) max: max ].
  	highestCharacterCode = -1 ifTrue: [ ^nil ].
  	set := highestCharacterCode <= 255
  		ifTrue: [ CharacterSet new ]
  		ifFalse: [ WideCharacterSet new ].
+ 	elements do: [ :each | each enumerateTo: set ].
- 	elements do: [ :each |
- 		each enumerateTo: set ignoringCase: aBoolean ].
  	^set!

Item was added:
+ ----- Method: RxsCharacter>>enumerateTo: (in category 'accessing') -----
+ enumerateTo: aSet
+ 
+ 	^aSet add: character!

Item was removed:
- ----- Method: RxsCharacter>>enumerateTo:ignoringCase: (in category 'accessing') -----
- enumerateTo: aSet ignoringCase: aBoolean
- 
- 	aBoolean ifFalse: [ ^aSet add: character ].
- 	aSet 
- 		add: character asUppercase;
- 		add: character asLowercase!

Item was added:
+ ----- Method: RxsPredicate>>enumerateTo: (in category 'accessing') -----
+ enumerateTo: aSet
+ 
+ 	^self "Not enumerable"!

Item was removed:
- ----- Method: RxsPredicate>>enumerateTo:ignoringCase: (in category 'accessing') -----
- enumerateTo: aSet ignoringCase: aBoolean
- 
- 	^self "Not enumerable"!

Item was added:
+ ----- Method: RxsRange>>enumerateTo: (in category 'accessing') -----
+ enumerateTo: aSet
+ 	"Add all of the elements I represent to the collection."
+ 
+ 	first asInteger to: last asInteger do: [ :charCode |
+ 		aSet add: charCode asCharacter ].
+ 	^self!

Item was removed:
- ----- Method: RxsRange>>enumerateTo:ignoringCase: (in category 'accessing') -----
- enumerateTo: aSet ignoringCase: aBoolean
- 	"Add all of the elements I represent to the collection."
- 
- 	aBoolean ifFalse: [
- 		first asInteger to: last asInteger do: [ :charCode |
- 			aSet add: charCode asCharacter ].
- 		^self ].
- 	first asInteger to: last asInteger do: [ :charCode |
- 		| character |
- 		character := charCode asCharacter.
- 		aSet
- 			add: character asLowercase;
- 			add: character asUppercase ]!



More information about the Squeak-dev mailing list