[squeak-dev] The Trunk: Regex-Core-ul.50.mcz

commits at source.squeak.org commits at source.squeak.org
Sun Apr 3 02:32:00 UTC 2016


Levente Uzonyi uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ul.50.mcz

==================== Summary ====================

Name: Regex-Core-ul.50
Author: ul
Time: 3 April 2016, 3:34:24.097121 am
UUID: dd38bd8a-38ae-47d0-aef4-8b02926d9d9c
Ancestors: Regex-Core-ul.49

- few more Spur-related character comparison optimizations
- RxMatcher >> #atBeginningOfLine will consider lf as an end-of-line character

=============== Diff against Regex-Core-ul.49 ===============

Item was changed:
  ----- Method: RxMatchOptimizer>>nonPrefixTester (in category 'private') -----
  nonPrefixTester
  
  	nonPrefixes ifNil: [ ^nil ].
  	nonPrefixes size = 1 ifTrue: [
  		| nonPrefixChar |
  		nonPrefixChar := nonPrefixes anyOne.
+ 		^[ :char :matcher | (char == nonPrefixChar) not ] ].
- 		^[ :char :matcher | char ~~ nonPrefixChar ] ].
  	^[ :char : matcher | (nonPrefixes includes: char) not ]!

Item was changed:
  Object subclass: #RxMatcher
  	instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions previousMarkerPositions markerCount lastResult firstTryMatch'
+ 	classVariableNames: 'Cr Lf NullCharacter'
- 	classVariableNames: 'Cr Lf'
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
  !RxMatcher commentStamp: 'ul 8/28/2015 14:18' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  This is a recursive regex matcher. Not strikingly efficient, but simple. Also, keeps track of matched subexpressions.  The life cycle goes as follows:
  
  1. Initialization. Accepts a syntax tree (presumably produced by RxParser) and compiles it into a matcher built of other classes in this category.
  
  2. Matching. Accepts a stream or a string and returns a boolean indicating whether the whole stream or its prefix -- depending on the message sent -- matches the regex.
  
  3. Subexpression query. After a successful match, and before any other match, the matcher may be queried about the range of specific stream (string) positions that matched to certain parenthesized subexpressions of the original expression.
  
  Any number of queries may follow a successful match, and any number or matches may follow a successful initialization.
  
  Note that `matcher' is actually a sort of a misnomer. The actual matcher is a web of Rxm* instances built by RxMatcher during initialization. RxMatcher is just the interface facade of this network.  It is also a builder of it, and also provides a stream-like protocol to easily access the stream being matched.
  
  Instance variables:
  	matcher					<RxmLink> The entry point into the actual matcher.
  	igoreCase					<Boolean> Whether the matching algorithm should be case sensitive or not.
  	startOptimizer				<RxMatchOptimizer> An object which can quickly decide whether the next character can be the prefix of a match or not.
  	stream						<Stream> The stream currently being matched against.
  	markerPositions			<Array of: nil | Integer | OrderedCollection> Positions of markers' matches.
  	previousMarkerPositions	<Array of: nil |  Integer | OrderedCollection> Positions of markers from the previous #tryMatch send.
  	markerCount				<Integer> Number of markers.
  	lastResult 					<Boolean> Whether the latest match attempt succeeded or not.
  	firtTryMatch				<Boolean> True if there hasn't been any send of #tryMatch during the current matching.!

Item was changed:
  ----- Method: RxMatcher class>>initialize (in category 'class initialization') -----
  initialize
+ 
  	"RxMatcher initialize"
  	Cr := Character cr.
+ 	Lf := Character lf.
+ 	NullCharacter := Character value: 0!
- 	Lf := Character lf.!

Item was changed:
  ----- Method: RxMatcher>>atBeginningOfLine (in category 'testing') -----
  atBeginningOfLine
  
+ 	| lastCharacter |
+ 	stream position = 0 ifTrue: [ ^true ].
+ 	(lastCharacter := stream last) == Cr ifTrue: [ ^true ].
+ 	^lastCharacter == Lf!
- 	^self position = 0 or: [self lastChar = Cr]!

Item was changed:
  ----- Method: RxMatcher>>lastChar (in category 'accessing') -----
  lastChar
+ 
+ 	^stream position = 0 ifFalse: [ stream last ]!
- 	^ stream position = 0
- 		ifFalse: [ stream skip: -1; next ]!

Item was changed:
  ----- Method: RxMatcher>>matchingRangesIn: (in category 'match enumeration') -----
  matchingRangesIn: aString
  	"Search aString repeatedly for the matches of the receiver.  Answer an OrderedCollection of ranges of each match (index of first character to: index of last character)."
  
  	| result |
  	result := OrderedCollection new.
  	self
  		matchesIn: aString 
+ 		do: [ :match | 
+ 			| streamPosition |
+ 			result add: ((streamPosition := stream position) - match size + 1 to: streamPosition)].
- 		do: [:match | result add: (self position - match size + 1 to: self position)].
  	^result!

Item was removed:
- ----- Method: RxMatcher>>position (in category 'streaming') -----
- position
- 
- 	^stream position!

Item was changed:
  ----- Method: RxMatcher>>syntaxAny (in category 'double dispatch') -----
  syntaxAny
  	"Double dispatch from the syntax tree. 
  	Create a matcher for any non-null character."
  
  	^RxmPredicate new
+ 		predicate: [:char | (char == NullCharacter) not ]!
- 		predicate: [:char | char asInteger ~= 0]!

Item was changed:
  ----- Method: RxmTerminator>>terminateWith: (in category 'building') -----
  terminateWith: aTerminator
  	"Branch terminators are never supposed to change.
  	Make sure this is the case."
  
+ 	aTerminator == self ifFalse: [
+ 		RxParser signalCompilationException: 'internal matcher build error - wrong terminator' ]!
- 	aTerminator ~~ self
- 		ifTrue: [RxParser signalCompilationException:
- 				'internal matcher build error - wrong terminator']!

Item was changed:
  ----- Method: RxsCharSet>>enumerablePartPredicateIgnoringCase: (in category 'privileged') -----
  enumerablePartPredicateIgnoringCase: ignoreCase
  
  	| set p |
  	set := (self enumerableSetIgnoringCase: ignoreCase) ifNil: [ ^nil ].
  	set size = 1 ifTrue: [
  		| char |
  		char := set anyOne.
  		ignoreCase ifTrue: [
  			| lowercaseChar |
  			lowercaseChar := char asLowercase.
  			char := char asUppercase.
  			char == lowercaseChar ifFalse: [ 
  				negated ifTrue: [ 
  					^[ :character | (character == char or: [ character == lowercaseChar ]) not ] ].
  				^[ :character | character == char or: [ character == lowercaseChar ] ] ] ].
+ 		negated ifTrue: [ ^[ :character | (character == char) not ] ].
- 		negated ifTrue: [ ^[ :character | character ~~ char ] ].
  		^[ :character | character == char ] ].
  	ignoreCase ifTrue: [
  		set copy do: [ :each |
  			| char |
  			(char := each asUppercase) == each
  				ifFalse: [ set add: char ]
  				ifTrue: [ 
  					(char := each asLowercase) == each ifFalse: [
  						set add: char ] ] ] ].
  	set size < 10 ifTrue: [ "10 is an empirical value"
  		p := set asArray.
  		negated ifTrue: [ ^[ :character | (p instVarsInclude: character) not ] ].
  		^[ :character | p instVarsInclude: character ] ].
  	negated ifTrue: [ ^[ :character | (set includes: character) not ] ].
  	^[ :character | set includes: character ]!

Item was changed:
  ----- Method: RxsContextCondition>>isNullable (in category 'testing') -----
  isNullable
  
+ 	^(#syntaxAny == kind) not!
- 	^#syntaxAny ~~ kind!

Item was changed:
  ----- Method: RxsPredicate>>beCharacter: (in category 'initialize-release') -----
  beCharacter: aCharacter
  
  	predicate := [ :char | char == aCharacter ].
+ 	negation := [ :char | (char == aCharacter) not ]!
- 	negation := [ :char | char ~~ aCharacter  ]!

Item was changed:
  ----- Method: RxsPredicate>>beWordConstituent (in category 'initialize-release') -----
  beWordConstituent
  
+ 	predicate := [ :char | char == $_ or: [ char isAlphaNumeric ] ].
+ 	negation := [ :char | (char == $_ or: [ char isAlphaNumeric ]) not ]!
- 	predicate := [:char | char isAlphaNumeric or: [char == $_]].
- 	negation := [:char | char isAlphaNumeric not and: [char ~~ $_]]!



More information about the Squeak-dev mailing list