[squeak-dev] The Trunk: Regex-Core-ct.78.mcz

commits at source.squeak.org commits at source.squeak.org
Wed Oct 5 18:05:02 UTC 2022


Christoph Thiede uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ct.78.mcz

==================== Summary ====================

Name: Regex-Core-ct.78
Author: ct
Time: 5 October 2022, 8:05:01.403966 pm
UUID: c732cc45-d624-2742-9574-f9803202d018
Ancestors: Regex-Core-ct.76, Regex-Core-ct.69, Regex-Core-ct.62, Regex-Core-ct.64, Regex-Core-ct.65, Regex-Core-ct.66, Regex-Core-ct.67, Regex-Core-ct.70, Regex-Core-ct.72, Regex-Core-ct.73, Regex-Core-ct.74

Merge commit. Please refer to the individual versions for their full description and diffs.

Regex-Core-ct.62:
	Fixes copying of cyclic RxmLink structures. This fixes the #testNestedQuantifiers failure from Regex-Tests-Core-ct.17.

Regex-Core-ct.63:
	Adds support for non-capturing groups. Also fixes a bug while parsing lookaround-like regexes (see #testLookaroundParser, Regex-Tests-Core-ct.18).

Regex-Core-ct.64:
	Fixes #testNoCapturingOfLookarounds (Regex-Tests-Core-ct.20).

Regex-Core-ct.65:
	Fixes a parser bug when encountering a brace quantifier after another quantifier. Regression tests are in #testQuantifierSequence, Regex-Tests-Core-ct.21.

Regex-Core-ct.66:
	Adds convenience selectors for accessing subexpressions (#allSubexpressions and #subexpressionRanges:). The other changes contain some minor refactorings only.
	
	Not a revision, but a supplemental note: This also fixes the incorrect order of capture groups within branches! See revision of Regex-Tests-Core-ct.22 in Regex-Tests-Core-ct.30.

Regex-Core-ct.67:
	Adds support for named capturing groups.
	
	Revision: Improved documentation. Renames #keyedSubexpressionsRanges: -> #keyedSubexpressionRanges:. Reverts side effect from #keyedSubexpressions:. BREAKING CHANGE: Both markerPositions and keyedMarkerPositions are no longer recorded in reverse order. #subBeginning:, #subEnd:, #subexpressionRanges:, and #keyedSubexpressionRanges: now answer the match position in ascending instead of descending order.

Regex-Core-ct.69:
	Adds String >> #escapeRegex to escape special characters in a string before composing it into another regex.
	
	Revision: Rename to #escapeForRegex.

Regex-Core-ct.70:
	Adds support for nullable closures and eliminates the eponymous error message.
	
	Revision: Improved documentation. s/(?<=initialize|reset)Marker(?=Positions)/Matcher/g. Fixes termination of lookarounds which need to have their own terminator (see RxMatcher>>#syntaxLookaround:forward:positive:). Fixes and tests match optimization for nullable matches (see RxMatchOptimizer>>#initialize:ignoreCase:). Consequentially, finally fixes #testOptionalLookbehind2, resolves the expected failure, and merges it back into #testOptionalLookbehind.

Regex-Core-ct.72:
	Makes RxMatcher polymorphic with itself

Regex-Core-ct.73:
	Removes longly obsolete constant on RxsNode.

Regex-Core-ct.74:
	Adds support for unicode backslash syntax in pieces and character sets.
	
	Revision: Miscellaneous refactoring, improved documentation, minor bug fixes, updates to new Unicode protocol and to new EHS signaling semantics. Makes dependency on Multilingual (Unicode) optional.
	
	Treats Regex-Core-tobe.62.

=============== Diff against Regex-Core-ct.76 ===============

Item was added:
+ Object subclass: #RxAbstractParser
+ 	instanceVariableNames: 'source lookahead'
+ 	classVariableNames: 'BackslashConstants BackslashPredicates'
+ 	poolDictionaries: ''
+ 	category: 'Regex-Core'!
+ 
+ !RxAbstractParser commentStamp: 'ct 10/28/2021 03:00' prior: 0!
+ I provide general parsing facilities for all kinds of regex parsers.
+ 
+ Instance variables:
+ 	input		<Stream> A stream with the expression being parsed.
+ 	lookahead	<Character>	The current lookahead character.!

Item was added:
+ ----- Method: RxAbstractParser class>>doShiftingSyntaxExceptionPositions:from: (in category 'exception signaling') -----
+ doShiftingSyntaxExceptionPositions: aBlock from: start
+ 	"When invoking a nested parser, make sure to update the positions of any syntax exception raised by this nested parser."
+ 	^ aBlock
+ 		on: RegexSyntaxError
+ 		do: [:ex | ex copy
+ 			position: ex position + start - 1;
+ 			signal]!

Item was added:
+ ----- Method: RxAbstractParser class>>initialize (in category 'class initialization') -----
+ initialize
+ 	"self initialize"
+ 	self
+ 		initializeBackslashConstants;
+ 		initializeBackslashPredicates!

Item was added:
+ ----- Method: RxAbstractParser class>>initializeBackslashConstants (in category 'class initialization') -----
+ initializeBackslashConstants
+ 	"self initializeBackslashConstants"
+ 
+ 	(BackslashConstants := Dictionary new)
+ 		at: $e put: Character escape;
+ 		at: $n put: Character lf;
+ 		at: $r put: Character cr;
+ 		at: $f put: Character newPage;
+ 		at: $t put: Character tab!

Item was added:
+ ----- Method: RxAbstractParser class>>initializeBackslashPredicates (in category 'class initialization') -----
+ initializeBackslashPredicates
+ 	"The keys are characters that normally follow a $\, the values are either associations of classes and initialization selectors on their instance side, or evaluables that will be evaluated on the current parser instance."
+ 	"self initializeBackslashPredicates"
+ 
+ 	(BackslashPredicates := Dictionary new)
+ 		at: $d put: RxsPredicate -> #beDigit;
+ 		at: $p put: #unicodeCategory;
+ 		at: $s put: RxsPredicate -> #beSpace;
+ 		at: $u put: #unicodePoint;
+ 		at: $w put: RxsPredicate -> #beWordConstituent;
+ 		at: $x put: #codePoint.!

Item was added:
+ ----- Method: RxAbstractParser class>>signalSyntaxException: (in category 'exception signaling') -----
+ signalSyntaxException: errorString
+ 	RegexSyntaxError new signal: errorString!

Item was added:
+ ----- Method: RxAbstractParser class>>signalSyntaxException:at: (in category 'exception signaling') -----
+ signalSyntaxException: errorString at: errorPosition
+ 	RegexSyntaxError signal: errorString at: errorPosition!

Item was added:
+ ----- Method: RxAbstractParser>>backslashConstant (in category 'recursive descent') -----
+ backslashConstant
+ 
+ 	| character |
+ 	character := BackslashConstants at: lookahead ifAbsent: [^ nil].
+ 	self next.
+ 	^ RxsCharacter with: character!

Item was added:
+ ----- Method: RxAbstractParser>>backslashNode (in category 'recursive descent') -----
+ backslashNode
+ 
+ 	| char |
+ 	lookahead ifNil: [ self signalParseError: 'bad quotation' ].
+ 	
+ 	self basicBackslashNode ifNotNil: [:node | ^node].
+ 	
+ 	char := lookahead.
+ 	self next.
+ 	^ RxsCharacter with: char!

Item was added:
+ ----- Method: RxAbstractParser>>backslashPredicate (in category 'recursive descent') -----
+ backslashPredicate
+ 
+ 	^ self backslashSpecial: BackslashPredicates!

Item was added:
+ ----- Method: RxAbstractParser>>backslashSpecial: (in category 'private') -----
+ backslashSpecial: specials
+ 
+ 	| negate specialSelector node |
+ 	negate := false.
+ 	specialSelector := specials at: lookahead ifAbsent: [
+ 		(lookahead isLetter and: [lookahead isUppercase]) ifTrue: [
+ 			negate := true.
+ 			specialSelector := specials at: lookahead asLowercase ifAbsent: []].
+ 		specialSelector ifNil: [^ nil]].
+ 	self next.
+ 	
+ 	node := specialSelector isVariableBinding
+ 		ifTrue: [specialSelector key new perform: specialSelector value]
+ 		ifFalse: [specialSelector value: self].
+ 	negate ifTrue: [node := node negated].
+ 	^ node!

Item was added:
+ ----- Method: RxAbstractParser>>basicBackslashNode (in category 'recursive descent') -----
+ basicBackslashNode
+ 	
+ 	self backslashConstant ifNotNil: [:node | ^ node].
+ 	self backslashPredicate ifNotNil: [:node | ^ node].
+ 	^ nil!

Item was added:
+ ----- Method: RxAbstractParser>>codePoint (in category 'recursive descent') -----
+ codePoint
+ 
+ 	^ self codePoint: 2!

Item was added:
+ ----- Method: RxAbstractParser>>codePoint: (in category 'recursive descent') -----
+ codePoint: size
+ 	"Matches a character that has the given code codepoint with the specified size of hex digits, unless braced.
+ 	<codePoint> ::= \x ({<hex>} | <hex>[size])"
+ 
+ 	| braced codeString codePoint codeStream |
+ 	braced := self tryMatch: ${.
+ 	codeString := braced
+ 		ifFalse: [self
+ 			input: size
+ 			errorMessage: 'invalid codepoint']
+ 		ifTrue: [self
+ 			inputUpTo: $}
+ 			errorMessage: 'no terminating "}"'].
+ 	
+ 	codeStream := codeString readStream.
+ 	codePoint := ((ExtendedNumberParser on: codeStream)
+ 		defaultBase: 16;
+ 		nextInteger "allow the stream to change the base -- beware: any new base is specified in base 16!!") ifNil: [
+ 			self signalParseError: 'invalid codepoint'].
+ 	codeStream atEnd ifFalse: [
+ 		self signalParseError: 'invalid codepoint'].
+ 	
+ 	braced ifTrue: [
+ 		self match: $}].
+ 	
+ 	^ RxsCharacter with: (Character codePoint: codePoint)!

Item was added:
+ ----- Method: RxAbstractParser>>initialize: (in category 'initialize-release') -----
+ initialize: aStream
+ 
+ 	source := aStream.
+ 	self next.!

Item was added:
+ ----- Method: RxAbstractParser>>input:errorMessage: (in category 'private') -----
+ input: anInteger errorMessage: aString
+ 	"Accumulate input stream with anInteger characters. Raise an error with the message aString if there are not enough characters available."
+ 
+ 	| accumulator |
+ 	accumulator := WriteStream on: (String new: 20).
+ 	anInteger timesRepeat: [
+ 		lookahead ifNil: [self signalParseError: aString].
+ 		accumulator nextPut: lookahead.
+ 		self next].
+ 	^ accumulator contents!

Item was added:
+ ----- Method: RxAbstractParser>>inputUpTo:errorMessage: (in category 'private') -----
+ inputUpTo: aCharacter errorMessage: aString
+ 	"Accumulate input stream until <aCharacter> is encountered
+ 	and answer the accumulated chars as String, not including
+ 	<aCharacter>. Signal error if end of stream is encountered,
+ 	passing <aString> as the error description."
+ 
+ 	| accumulator |
+ 	accumulator := WriteStream on: (String new: 20).
+ 	[ lookahead == aCharacter or: [lookahead == nil ] ]
+ 		whileFalse: [
+ 			accumulator nextPut: lookahead.
+ 			self next].
+ 	lookahead ifNil: [ self signalParseError: aString ].
+ 	^accumulator contents!

Item was added:
+ ----- Method: RxAbstractParser>>inputUpTo:nestedOn:errorMessage: (in category 'private') -----
+ inputUpTo: closingCharacter nestedOn: openingCharacter errorMessage: aString 
+ 	"Accumulate input stream until <closingCharacter> is encountered without escaping and answer the accumulated chars as String, not including <closingCharacter>. Inputs may contain nested data inside pairs of <openingCharacter> and <closingCharacter>. Signal error if end of stream is encountered, passing <aString> as the error description."
+ 
+ 	| accumulator nestLevel |
+ 	accumulator := WriteStream on: (String new: 20).
+ 	nestLevel := 0.
+ 	[ lookahead == closingCharacter and: [ nestLevel = 0 ] ] whileFalse: [
+ 		lookahead ifNil: [ self signalParseError: aString ].
+ 		lookahead == $\
+ 			ifTrue: [ 
+ 				self next ifNil: [ self signalParseError: aString ].
+ 				accumulator
+ 					nextPut: $\;
+ 					nextPut: lookahead ]
+ 			ifFalse: [
+ 				accumulator nextPut: lookahead.
+ 				lookahead == openingCharacter ifTrue: [ nestLevel := nestLevel + 1 ].
+ 				lookahead == closingCharacter ifTrue: [ nestLevel := nestLevel - 1 ] ].
+ 		self next ].
+ 	^accumulator contents!

Item was added:
+ ----- Method: RxAbstractParser>>inputUpToAny:errorMessage: (in category 'private') -----
+ inputUpToAny: aDelimiterString errorMessage: aString
+ 	"Accumulate input stream until any character from <aDelimiterString> is encountered
+ 	and answer the accumulated chars as String, not including the matched characters from the
+ 	<aDelimiterString>. Signal error if end of stream is encountered,
+ 	passing <aString> as the error description."
+ 
+ 	| accumulator |
+ 	accumulator := WriteStream on: (String new: 20).
+ 	[ lookahead == nil or: [ aDelimiterString includes: lookahead ] ]
+ 		whileFalse: [
+ 			accumulator nextPut: lookahead.
+ 			self next ].
+ 	lookahead ifNil: [ self signalParseError: aString ].
+ 	^accumulator contents!

Item was added:
+ ----- Method: RxAbstractParser>>inputWhile:errorMessage: (in category 'private') -----
+ inputWhile: aBlock errorMessage: aString
+ 	"Accumulate input stream until <aBlock> returns false on the next character and answer the accumulated chars as String, not including the last character. Signal error if end of stream is encountered, passing <aString> as the error description."
+ 
+ 	| accumulator |
+ 	accumulator := WriteStream on: (String new: 20).
+ 	lookahead ifNil: [self signalParseError: aString].
+ 	[lookahead notNil and: [aBlock cull: lookahead]] whileTrue: [
+ 		accumulator nextPut: lookahead.
+ 		self next].
+ 	^ accumulator contents!

Item was added:
+ ----- Method: RxAbstractParser>>match: (in category 'recursive descent') -----
+ match: aCharacter
+ 	"<aCharacter> MUST match the current lookeahead. If this is the case, advance the input. Otherwise, blow up."
+ 
+ 	aCharacter = lookahead ifTrue: [ ^self next ].
+ 	self signalParseError: (lookahead
+ 		ifNil: ['unexpected end']
+ 		ifNotNil: ['unexpected character: ', lookahead asString])!

Item was added:
+ ----- Method: RxAbstractParser>>next (in category 'private') -----
+ next
+ 
+ 	^ lookahead := source next!

Item was added:
+ ----- Method: RxAbstractParser>>signalParseError (in category 'private') -----
+ signalParseError
+ 
+ 	^ self signalParseError: 'Regex syntax error'!

Item was added:
+ ----- Method: RxAbstractParser>>signalParseError: (in category 'private') -----
+ signalParseError: aString
+ 
+ 	^self class
+ 		signalSyntaxException: aString
+ 		at: source position!

Item was added:
+ ----- Method: RxAbstractParser>>tryMatch: (in category 'private') -----
+ tryMatch: aCharacter
+ 
+ 	^ lookahead == aCharacter
+ 		ifTrue: [self next];
+ 		yourself!

Item was added:
+ ----- Method: RxAbstractParser>>unicodeCategory (in category 'recursive descent') -----
+ unicodeCategory
+ 	"Matches a character that belongs to the given unicode category.
+ 	<unicodeCategory> ::= \p '{' <categoryName> '}'"
+ 
+ 	| category |
+ 	self match: ${.
+ 	category := self inputUpTo: $} errorMessage: 'no terminating "}"'.
+ 	self match: $}.
+ 	
+ 	RxsPredicate supportsUnicode ifFalse:
+ 		[self signalParseError: 'unicode support is not available'].
+ 	(RxsPredicate isValidUnicodeCategory: category) ifFalse:
+ 		[self signalParseError: 'unknown unicode category: ' , category].
+ 	^ RxsPredicate new beUnicodeCategory: category!

Item was added:
+ ----- Method: RxAbstractParser>>unicodePoint (in category 'recursive descent') -----
+ unicodePoint
+ 
+ 	^ self codePoint: 4!

Item was changed:
+ RxAbstractParser subclass: #RxCharSetParser
+ 	instanceVariableNames: 'elements'
- Object subclass: #RxCharSetParser
- 	instanceVariableNames: 'source lookahead elements'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
+ !RxCharSetParser commentStamp: 'ct 10/28/2021 02:59' prior: 0!
- !RxCharSetParser commentStamp: 'Tbn 11/12/2010 23:13' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  I am a parser created to parse the insides of a character set ([...]) construct. I create and answer a collection of "elements", each being an instance of one of: RxsCharacter, RxsRange, or RxsPredicate.
  
  Instance Variables:
  
- 	source	<Stream>	open on whatever is inside the square brackets we have to parse.
- 	lookahead	<Character>	The current lookahead character
  	elements	<Collection of: <RxsCharacter|RxsRange|RxsPredicate>> Parsing result!

Item was added:
+ ----- Method: RxCharSetParser>>add: (in category 'accessing') -----
+ add: nodeOrNodes
+ 
+ 	nodeOrNodes isCollection
+ 		ifFalse: [elements add: nodeOrNodes]
+ 		ifTrue: [elements addAll: nodeOrNodes]!

Item was removed:
- ----- Method: RxCharSetParser>>addChar: (in category 'parsing') -----
- addChar: aChar
- 
- 	elements add: (RxsCharacter with: aChar)!

Item was removed:
- ----- Method: RxCharSetParser>>addRangeFrom:to: (in category 'parsing') -----
- addRangeFrom: firstChar to: lastChar
- 
- 	firstChar asInteger > lastChar asInteger ifTrue:
- 		[RxParser signalSyntaxException: ' bad character range' at: source position].
- 	elements add: (RxsRange from: firstChar to: lastChar)!

Item was added:
+ ----- Method: RxCharSetParser>>char (in category 'recursive descent') -----
+ char
+ 
+ 	| char |
+ 	(self tryMatch: $\) ifTrue:
+ 		[^self backslashNode
+ 			ifNil: [RxsCharacter with: lookahead]].
+ 	
+ 	char := RxsCharacter with: lookahead.
+ 	self next.
+ 	^char!

Item was added:
+ ----- Method: RxCharSetParser>>char: (in category 'parsing') -----
+ char: aCharacter
+ 
+ 	^ RxsCharacter with: aCharacter!

Item was added:
+ ----- Method: RxCharSetParser>>charOrRange (in category 'recursive descent') -----
+ charOrRange
+ 
+ 	| firstChar lastChar |
+ 	firstChar := self char.
+ 	lookahead == $- ifFalse:
+ 		[^firstChar].
+ 	
+ 	self next ifNil:
+ 		[^{firstChar. self char: $-}].
+ 	
+ 	lastChar := self char.
+ 	firstChar isRegexCharacter ifFalse:
+ 		[self signalParseError: 'range must start with a single character'].
+ 	lastChar isRegexCharacter ifFalse: 
+ 		[self signalParseError: 'range must end with a single character'].
+ 	^self rangeFrom: firstChar character to: lastChar character!

Item was added:
+ ----- Method: RxCharSetParser>>element (in category 'recursive descent') -----
+ element
+ 
+ 	(lookahead == $[ and: [source peek == $:]) ifTrue:
+ 		[^self namedSet].
+ 	^self charOrRange!

Item was changed:
  ----- Method: RxCharSetParser>>initialize: (in category 'initialize-release') -----
  initialize: aStream
  
+ 	super initialize: aStream.
- 	source := aStream.
- 	lookahead := aStream next.
  	elements := OrderedCollection new!

Item was removed:
- ----- Method: RxCharSetParser>>match: (in category 'parsing') -----
- match: aCharacter
- 
- 	aCharacter = lookahead ifTrue: [ ^self next ].
- 	RxParser 
- 		signalSyntaxException: 'unexpected character: ', (String with: lookahead)
- 		at: source position!

Item was added:
+ ----- Method: RxCharSetParser>>namedSet (in category 'recursive descent') -----
+ namedSet
+ 
+ 	| name |
+ 	self match: $[; match: $:.
+ 	name := (String with: lookahead), (source upTo: $:).
+ 	self next.
+ 	self match: $].
+ 	^ RxsPredicate forNamedClass: name!

Item was removed:
- ----- Method: RxCharSetParser>>next (in category 'parsing') -----
- next
- 
- 	^lookahead := source next!

Item was changed:
  ----- Method: RxCharSetParser>>parse (in category 'accessing') -----
  parse
  
+ 	[ lookahead == nil ] whileFalse: [ self add: self element ].
- 	lookahead == $- ifTrue: [
- 		self addChar: $-.
- 		self next ].
- 	[ lookahead == nil ] whileFalse: [ self parseStep ].
  	^elements!

Item was removed:
- ----- Method: RxCharSetParser>>parseCharOrRange (in category 'parsing') -----
- parseCharOrRange
- 
- 	| firstChar |
- 	firstChar := lookahead.
- 	self next == $- ifFalse: [ ^self addChar: firstChar ].
- 	self next ifNil: [ ^self addChar: firstChar; addChar: $- ].
- 	self addRangeFrom: firstChar to: lookahead.
- 	self next!

Item was removed:
- ----- Method: RxCharSetParser>>parseEscapeChar (in category 'parsing') -----
- parseEscapeChar
- 
- 	| first |
- 	self match: $\.
- 	first := (RxsPredicate forEscapedLetter: lookahead)
- 		ifNil: [ RxsCharacter with: lookahead ].
- 	self next == $- ifFalse: [^ elements add: first].
- 	self next ifNil: [
- 		elements add: first.
- 		^ self addChar: $-].
- 	self addRangeFrom: first character to: lookahead.
- 	self next!

Item was removed:
- ----- Method: RxCharSetParser>>parseNamedSet (in category 'parsing') -----
- parseNamedSet
- 
- 	| name |
- 	self match: $[; match: $:.
- 	name := (String with: lookahead), (source upTo: $:).
- 	self next.
- 	self match: $].
- 	elements add: (RxsPredicate forNamedClass: name)!

Item was removed:
- ----- Method: RxCharSetParser>>parseStep (in category 'parsing') -----
- parseStep
- 
- 	lookahead == $[ ifTrue:
- 		[source peek == $:
- 			ifTrue: [^self parseNamedSet]
- 			ifFalse: [^self parseCharOrRange]].
- 	lookahead == $\ ifTrue:
- 		[^self parseEscapeChar].
- 	lookahead == $- ifTrue:
- 		[RxParser signalSyntaxException: 'invalid range' at: source position].
- 	self parseCharOrRange!

Item was added:
+ ----- Method: RxCharSetParser>>rangeFrom:to: (in category 'parsing') -----
+ rangeFrom: firstChar to: lastChar
+ 
+ 	firstChar asInteger > lastChar asInteger ifTrue:
+ 		[self signalParseError: 'bad character range'].
+ 	^ RxsRange from: firstChar to: lastChar!

Item was changed:
  ----- Method: RxMatchOptimizer>>initialize:ignoreCase: (in category 'initialize-release') -----
  initialize: aRegex ignoreCase: aBoolean 
  	"Set `testMethod' variable to a can-match predicate block:
  	two-argument block which accepts a lookahead character
  	and a matcher (presumably built from aRegex) and answers 
  	a boolean indicating whether a match could start at the given
  	lookahead. "
  
  	ignoreCase := aBoolean.
  	aRegex dispatchTo: self.
+ 	"If the whole expression is nullable, we can match anything!!"
+ 	aRegex isNullable ifTrue: [ ^self ].
- 	"If the whole expression is nullable, end-of-line is an implicit can-match condition!!"
- 	aRegex isNullable ifTrue: [ self addCondition: #atEndOfLine ].
  	testBlock := self determineTestMethod!

Item was changed:
  ----- Method: RxMatchOptimizer>>syntaxCharSet: (in category 'double dispatch') -----
  syntaxCharSet: charSetNode 
  	"All these (or none of these) characters is the prefix."
  
  	(charSetNode enumerableSetIgnoringCase: ignoreCase) ifNotNil: [ :enumerableSet |
  		charSetNode isNegated
  			ifTrue: [ self addNonPrefixes: enumerableSet ]
  			ifFalse: [ self addPrefixes: enumerableSet ] ].
  
+ 	(charSetNode predicatesIgnoringCase: ignoreCase) ifNotNil: [ :charsetPredicates |
- 	charSetNode predicates ifNotNil: [ :charsetPredicates |
  		charSetNode isNegated
  			ifTrue: [ 
  				charsetPredicates do: [ :each | self addNonPredicate: each ] ]
  			ifFalse: [ 
  				charsetPredicates do: [ :each | self addPredicate: each ] ] ]!

Item was changed:
  ----- Method: RxMatchOptimizer>>syntaxPredicate: (in category 'double dispatch') -----
  syntaxPredicate: predicateNode 
  
+ 	self addPredicate: (predicateNode predicateIgnoringCase: ignoreCase)!
- 	self addPredicate: predicateNode predicate!

Item was changed:
  Object subclass: #RxMatcher
+ 	instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions keyedMarkerPositions previousMarkerPositions markerCount branchPositions lastResult firstTryMatch'
- 	instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions previousMarkerPositions markerCount lastResult firstTryMatch'
  	classVariableNames: 'Cr Lf NullCharacter'
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
+ !RxMatcher commentStamp: 'ct 10/5/2022 13:20' prior: 0!
- !RxMatcher commentStamp: 'ul 8/28/2015 14:18' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  This is a recursive regex matcher. Not strikingly efficient, but simple. Also, keeps track of matched subexpressions.  The life cycle goes as follows:
  
  1. Initialization. Accepts a syntax tree (presumably produced by RxParser) and compiles it into a matcher built of other classes in this category.
  
  2. Matching. Accepts a stream or a string and returns a boolean indicating whether the whole stream or its prefix -- depending on the message sent -- matches the regex.
  
  3. Subexpression query. After a successful match, and before any other match, the matcher may be queried about the range of specific stream (string) positions that matched to certain parenthesized subexpressions of the original expression.
  
  Any number of queries may follow a successful match, and any number or matches may follow a successful initialization.
  
  Note that `matcher' is actually a sort of a misnomer. The actual matcher is a web of Rxm* instances built by RxMatcher during initialization. RxMatcher is just the interface facade of this network.  It is also a builder of it, and also provides a stream-like protocol to easily access the stream being matched.
  
  Instance variables:
  	matcher					<RxmLink> The entry point into the actual matcher.
  	igoreCase					<Boolean> Whether the matching algorithm should be case sensitive or not.
  	startOptimizer				<RxMatchOptimizer> An object which can quickly decide whether the next character can be the prefix of a match or not.
  	stream						<Stream> The stream currently being matched against.
+ 	markerPositions				<Array of: nil | Integer | OrderedCollection> Positions of markers' matches.
+ 	keyedMarkerPositions		<Dictionary of: Object -> (OrderedCollection of: {beginIndex. endIndex)}> Indexes of markerPositions for keyed markers' matches.
- 	markerPositions			<Array of: nil | Integer | OrderedCollection> Positions of markers' matches.
  	previousMarkerPositions	<Array of: nil |  Integer | OrderedCollection> Positions of markers from the previous #tryMatch send.
  	markerCount				<Integer> Number of markers.
+ 	branchPositions				<Array of: (OrderedCollection of: Integer)> Temporary positions of looped branches.
  	lastResult 					<Boolean> Whether the latest match attempt succeeded or not.
  	firtTryMatch				<Boolean> True if there hasn't been any send of #tryMatch during the current matching.!

Item was added:
+ ----- Method: RxMatcher>>allKeyedSubexpressions (in category 'accessing') -----
+ allKeyedSubexpressions
+ 
+ 	^ self keyedMarkers
+ 		collect: [:key | key -> (self keyedSubexpressions: key)]
+ 		as: Dictionary!

Item was added:
+ ----- Method: RxMatcher>>allSubexpressions (in category 'accessing') -----
+ allSubexpressions
+ 
+ 	^ (1 to: self subexpressionCount) collect: [:index |
+ 		self subexpressions: index]!

Item was added:
+ ----- Method: RxMatcher>>asRegex (in category 'converting') -----
+ asRegex
+ 
+ 	^ self!

Item was changed:
  ----- Method: RxMatcher>>buildFrom: (in category 'accessing') -----
  buildFrom: aSyntaxTreeRoot
  	"Private - Entry point of matcher build process."
  
  	markerCount := 0.  "must go before #dispatchTo: !!"
+ 	keyedMarkerPositions := Dictionary new.
  	matcher := aSyntaxTreeRoot dispatchTo: self.
  	matcher terminateWith: RxmTerminator new!

Item was changed:
  ----- Method: RxMatcher>>copyStream:to:replacingMatchesWith: (in category 'match enumeration') -----
  copyStream: aStream to: writeStream replacingMatchesWith: aString
  	"Copy the contents of <aStream> on the <writeStream>, except for the matches. Replace each match with <aString>."
  
  	| searchStart matchStart matchEnd |
  	stream := aStream.
  	firstTryMatch := true.
  	[searchStart := aStream position.
  	self proceedSearchingStream: aStream] whileTrue:
+ 		[matchStart := (self subBeginning: 1) first.
+ 		matchEnd := (self subEnd: 1) first.
- 		[matchStart := (self subBeginning: 1) last.
- 		matchEnd := (self subEnd: 1) last.
  		aStream position: searchStart.
  		searchStart to: matchStart - 1 do:
  			[:ignoredPos | writeStream nextPut: aStream next].
  		writeStream nextPutAll: aString.
  		aStream position: matchEnd.
  		"Be extra careful about successful matches which consume no input.
  		After those, make sure to advance or finish if already at end."
  		matchEnd = searchStart ifTrue: 
  			[aStream atEnd
  				ifTrue:	[^self "rest after end of whileTrue: block is a no-op if atEnd"]
  				ifFalse:	[writeStream nextPut: aStream next]]].
  	aStream position: searchStart.
  	[aStream atEnd] whileFalse: [writeStream nextPut: aStream next]!

Item was changed:
  ----- Method: RxMatcher>>copyStream:to:translatingMatchesUsing: (in category 'match enumeration') -----
  copyStream: aStream to: writeStream translatingMatchesUsing: aBlock
  	"Copy the contents of <aStream> on the <writeStream>, except for the matches. For each match, evaluate <aBlock> passing the matched substring as the argument.  Expect the block to answer a String, and write the answer to <writeStream> in place of the match."
  
  	| searchStart matchStart matchEnd match |
  	stream := aStream.	
  	firstTryMatch := true.
  	[searchStart := aStream position.
  	self proceedSearchingStream: aStream] whileTrue:
+ 		[matchStart := (self subBeginning: 1) first.
+ 		matchEnd := (self subEnd: 1) first.
- 		[matchStart := (self subBeginning: 1) last.
- 		matchEnd := (self subEnd: 1) last.
  		aStream position: searchStart.
  		searchStart to: matchStart - 1 do:
  			[:ignoredPos | writeStream nextPut: aStream next].
  		match := (String new: matchEnd - matchStart + 1) writeStream.
  		matchStart to: matchEnd - 1 do:
  			[:ignoredPos | match nextPut: aStream next].
  		writeStream nextPutAll: (aBlock value: match contents).
  		"Be extra careful about successful matches which consume no input.
  		After those, make sure to advance or finish if already at end."
  		matchEnd = searchStart ifTrue: 
  			[aStream atEnd
  				ifTrue:	[^self "rest after end of whileTrue: block is a no-op if atEnd"]
  				ifFalse:	[writeStream nextPut: aStream next]]].
  	aStream position: searchStart.
  	[aStream atEnd] whileFalse: [writeStream nextPut: aStream next]!

Item was changed:
  ----- Method: RxMatcher>>hookBranchOf:onto: (in category 'private') -----
  hookBranchOf: regexNode onto: endMarker
+ 	"Private - Recurse down the chain of regexes starting at regexNode, compiling their branches and hooking their tails to the endMarker node."
- 	"Private - Recurse down the chain of regexes starting at
- 	regexNode, compiling their branches and hooking their tails 
- 	to the endMarker node."
  
+ 	| next rest |
+ 	next := (regexNode branch dispatchTo: self)
+ 		pointTailTo: endMarker; 
+ 		yourself.
+ 	regexNode regex ifNil: [
+ 		"Avoid creating a branch without an alternative."
+ 		^ next].
+ 	
+ 	rest := self hookBranchOf: regexNode regex onto: endMarker.
+ 	^ RxmBranch new
+ 		next: next;
+ 		alternative: rest;
+ 		yourself!
- 	^regexNode regex 
- 		ifNil: [ "Avoid creating a branch without an alternative."
- 			^(regexNode branch dispatchTo: self)
- 				pointTailTo: endMarker; 
- 				yourself ]
- 		ifNotNil: [ :regex |
- 			| rest |
- 			rest := self hookBranchOf: regex onto: endMarker.
- 			^RxmBranch new
- 				next: ((regexNode branch dispatchTo: self)
- 					pointTailTo: endMarker; 
- 					yourself);
- 				alternative: rest;
- 				yourself ]
- !

Item was changed:
  ----- Method: RxMatcher>>initialize:ignoreCase: (in category 'initialize-release') -----
  initialize: syntaxTreeRoot ignoreCase: aBoolean
  	"Compile thyself for the regex with the specified syntax tree.
  	See comment and `building' protocol in this class and 
  	#dispatchTo: methods in syntax tree components for details 
  	on double-dispatch building. 
  	The argument is supposedly a RxsRegex."
  
  	ignoreCase := aBoolean.
  	self buildFrom: syntaxTreeRoot.
+ 	self initializeMatcherPositions.
- 	self initializeMarkerPositions.
  	startOptimizer := RxMatchOptimizer new initialize: syntaxTreeRoot ignoreCase: aBoolean.
  	startOptimizer hasTestBlock ifFalse: [
  		startOptimizer := nil ]!

Item was removed:
- ----- Method: RxMatcher>>initializeMarkerPositions (in category 'initialize-release') -----
- initializeMarkerPositions
- 
- 	markerPositions := Array new: markerCount.
- 	previousMarkerPositions := Array new: markerCount..
- 	3 to: markerCount do: [ :index |
- 		markerPositions at: index put: (OrderedCollection new: 1).
- 		previousMarkerPositions at: index put: (OrderedCollection new: 1) ].!

Item was added:
+ ----- Method: RxMatcher>>initializeMatcherPositions (in category 'initialize-release') -----
+ initializeMatcherPositions
+ 
+ 	markerPositions := Array new: markerCount.
+ 	previousMarkerPositions := Array new: markerCount..
+ 	3 to: markerCount do: [ :index |
+ 		markerPositions at: index put: (OrderedCollection new: 1).
+ 		previousMarkerPositions at: index put: (OrderedCollection new: 1) ].
+ 	
+ 	branchPositions := Dictionary new.!

Item was added:
+ ----- Method: RxMatcher>>keyedMarkers (in category 'accessing') -----
+ keyedMarkers
+ 
+ 	^ keyedMarkerPositions keys!

Item was added:
+ ----- Method: RxMatcher>>keyedSubexpression: (in category 'accessing') -----
+ keyedSubexpression: key
+ 
+ 	^ (self keyedSubexpressions: key)
+ 		ifNotEmpty: [:matches | matches last]
+ 		ifEmpty: [nil]!

Item was added:
+ ----- Method: RxMatcher>>keyedSubexpressionRanges: (in category 'accessing') -----
+ keyedSubexpressionRanges: key
+ 	"Answer an array of all match ranges (inclusiveStart -> inclusiveStop) of the subexpression at the given key."
+ 
+ 	^ (keyedMarkerPositions at: key) gather: [:pair |
+ 		(markerPositions at: pair first)
+ 			with: (markerPositions at: pair second)
+ 			collect: [:start :stop | start + 1 to: stop]]!

Item was added:
+ ----- Method: RxMatcher>>keyedSubexpressions: (in category 'accessing') -----
+ keyedSubexpressions: key
+ 
+ 	| originalPosition reply |
+ 	originalPosition := stream position.
+ 	reply := (self keyedSubexpressionRanges: key) collect: [:range |
+ 		stream
+ 			position: range start - 1;
+ 			next: range size].
+ 	stream position: originalPosition.
+ 	^ reply!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstBranch:nextLink:alternativeLink: (in category 'matching') -----
+ matchAgainstBranch: anRxmBranch nextLink: nextLink alternativeLink: alternativeLink
+ 	"Greedy matching."
+ 
+ 	| position result |
+ 	"Detect infinite empty backloops from nested links."
+ 	position := stream position.
+ 	branchPositions
+ 		at: anRxmBranch
+ 		ifPresent: [:positions | (positions includes: position) ifTrue: [
+ 			"We already had this situation earlier in the current stack, don't retry again"
+ 			self flag: #ct. "I was struggling to produce formal evidence that there can be no matching chain of links where the first occurence of a link would not advance the cursor but the second occurence would actually constitute a match. The only example of which I could think would be some side effect like ^a(\1?)*$ to match 'aa', but apparently no other popular regex engine is able to handle this, too. Also, support for backreferences is yet on my to-do list. :-) Can there be any other violation?"
+ 			^ false]].
+ 	
+ 	"Remember the current position for this branch to enable detection of infinite empty backloops."
+ 	(branchPositions at: anRxmBranch ifAbsentPut: [OrderedCollection new]) add: position.
+ 	
+ 	result := (nextLink matchAgainst: self)
+ 		or: [alternativeLink notNil and: [alternativeLink matchAgainst: self]].
+ 	
+ 	(branchPositions at: anRxmBranch) remove: position.
+ 	
+ 	^ result!

Item was changed:
  ----- Method: RxMatcher>>matchAgainstMarkerAt:nextLink: (in category 'matching') -----
  matchAgainstMarkerAt: index nextLink: anRmxLink
  
  	| position |
  	position := stream position.
  	(anRmxLink matchAgainst: self) ifFalse: [ ^false ].
  	index <= 2 
  		ifTrue: [ markerPositions at: index put: position ]
+ 		ifFalse: [ (markerPositions at: index) addFirst: position ].
- 		ifFalse: [ (markerPositions at: index) addLast: position ].
  	^true!

Item was added:
+ ----- Method: RxMatcher>>registerMarkerKey:from:to: (in category 'private') -----
+ registerMarkerKey: key from: beginIndex to: endIndex
+ 
+ 	(keyedMarkerPositions at: key ifAbsentPut: [OrderedCollection new])
+ 		addFirst: {beginIndex. endIndex}.!

Item was removed:
- ----- Method: RxMatcher>>resetMarkerPositions (in category 'private') -----
- resetMarkerPositions
- 	"Reset the marker positions. This method should only be sent from #tryMatch. When this is after the first #tryMatch send, then the marker positions must be swapped."
- 
- 	firstTryMatch
- 		ifTrue: [ firstTryMatch := false ]
- 		ifFalse: [
- 			| temp |
- 			temp := previousMarkerPositions.
- 			previousMarkerPositions := markerPositions.
- 			markerPositions := temp ].
- 	markerPositions
- 		at: 1 put: nil;
- 		at: 2 put: nil.
- 	3 to: markerCount do: [ :index | 
- 		(markerPositions at: index) resetTo: 1 ]!

Item was added:
+ ----- Method: RxMatcher>>resetMatcherPositions (in category 'private') -----
+ resetMatcherPositions
+ 	"Reset the marker positions. This method should only be sent from #tryMatch. When this is after the first #tryMatch send, then the marker positions must be swapped."
+ 
+ 	firstTryMatch
+ 		ifTrue: [ firstTryMatch := false ]
+ 		ifFalse: [
+ 			| temp |
+ 			temp := previousMarkerPositions.
+ 			previousMarkerPositions := markerPositions.
+ 			markerPositions := temp ].
+ 	markerPositions
+ 		at: 1 put: nil;
+ 		at: 2 put: nil.
+ 	3 to: markerCount do: [:index | 
+ 		(markerPositions at: index) reset].
+ 	
+ 	branchPositions removeAll.!

Item was changed:
  ----- Method: RxMatcher>>subBeginning: (in category 'accessing') -----
  subBeginning: subIndex
+ 	"Answer an array of all begin positions (exclusive) for the subexpression at the given index."
  
  	subIndex = 1 ifTrue: [
+ 		"beginning of root node"
+ 		^ (markerPositions at: 1)
+ 			ifNil: [#()]
+ 			ifNotNil: [:mp | {mp}] ].
+ 	
+ 	^ markerPositions at: subIndex * 2 - 1!
- 		(markerPositions at: 1)
- 			ifNil: [ ^#()]
- 			ifNotNil: [ :mp | ^{ mp } ] ].
- 	^markerPositions at: subIndex * 2 - 1!

Item was changed:
  ----- Method: RxMatcher>>subEnd: (in category 'accessing') -----
  subEnd: subIndex
+ 	"Answer an array of all end positions (incclusive) for the subexpression at the given index."
  
  	subIndex = 1 ifTrue: [
+ 		"end of root node"
+ 		^ (markerPositions at: 2)
+ 			ifNil: [#()]
+ 			ifNotNil: [:mp | {mp}] ].
+ 	
+ 	^ markerPositions at: subIndex * 2!
- 		(markerPositions at: 2)
- 			ifNil: [ ^#()]
- 			ifNotNil: [ :mp | ^{ mp } ] ].
- 	^markerPositions at: subIndex * 2!

Item was changed:
  ----- Method: RxMatcher>>subexpression: (in category 'accessing') -----
  subexpression: subIndex
  	"Answer a string that matched the subexpression at the given index.
  	If there are multiple matches, answer the last one.
  	If there are no matches, answer nil. 
  	(NB: it used to answer an empty string but I think nil makes more sense)."
  
+ 	^ (self subexpressions: subIndex)
+ 		ifNotEmpty: [:expressions | expressions last]
+ 		ifEmpty: [nil]!
- 	| matches |
- 	matches := self subexpressions: subIndex.
- 	^matches isEmpty ifTrue: [nil] ifFalse: [matches last]!

Item was added:
+ ----- Method: RxMatcher>>subexpressionRanges: (in category 'accessing') -----
+ subexpressionRanges: subIndex
+ 	"Answer an array of all match ranges (inclusiveStart -> inclusiveStop) of the subexpression at the given index."
+ 
+ 	^ (self subBeginning: subIndex)
+ 		with: (self subEnd: subIndex)
+ 		collect: [:start :stop | start + 1 to: stop]!

Item was changed:
  ----- Method: RxMatcher>>subexpressions: (in category 'accessing') -----
  subexpressions: subIndex
  	"Answer an array of all matches of the subexpression at the given index.
  	The answer is always an array; it is empty if there are no matches."
  
  	| originalPosition startPositions stopPositions reply |
  	originalPosition := stream position.
  	startPositions := self subBeginning: subIndex.
  	stopPositions := self subEnd: subIndex.
  	(startPositions isEmpty or: [stopPositions isEmpty]) ifTrue: [^Array new].
  	reply := Array new: startPositions size.
  	1 to: reply size do: [ :index |
  		| start stop |
  		start := startPositions at: index.
  		stop := stopPositions at: index.
  		stream position: start.
+ 		reply at: index put: (stream next: stop - start) ].
- 		reply at: reply size - index + 1 put: (stream next: stop - start) ].
  	stream position: originalPosition.
  	^reply!

Item was changed:
  ----- Method: RxMatcher>>syntaxLookaround:forward:positive: (in category 'double dispatch') -----
  syntaxLookaround: lookaroundNode forward: forwardBoolean positive: positiveBoolean
  	"Double dispatch from the syntax tree.
  	Special link can handle lookarounds (look ahead and look behind, positive and negative)."
  	| piece |
  	piece := lookaroundNode piece dispatchTo: self.
+ 	piece terminateWith: RxmTerminator new. "see commment in RxmLookaround>>#terminateWith:"
  	^ RxmLookaround with: piece forward: forwardBoolean positive: positiveBoolean!

Item was changed:
  ----- Method: RxMatcher>>syntaxPredicate: (in category 'double dispatch') -----
  syntaxPredicate: predicateNode
  	"Double dispatch from the syntax tree. 
  	A character set is a few characters, and we either match any of them,
  	or match any that is not one of them."
  
+ 	^RxmPredicate with: (predicateNode predicateIgnoringCase: ignoreCase)!
- 	^RxmPredicate with: predicateNode predicate!

Item was changed:
  ----- Method: RxMatcher>>syntaxRegex: (in category 'double dispatch') -----
  syntaxRegex: regexNode
+ 	"Double dispatch from the syntax tree.
+ 	Regex node is a chain of branches to be tried. Should compile this into a bundle of parallel branches, between two marker nodes."
+ 
+ 	| startNode endNode alternatives |
+ 	regexNode isCapturing ifFalse: [
+ 		^ regexNode branch dispatchTo: self].
- 	"Double dispatch from the syntax tree. 
- 	Regex node is a chain of branches to be tried. Should compile this 
- 	into a bundle of parallel branches, between two marker nodes." 
  	
+ 	startNode := RxmMarker new index: self allocateMarker.
+ 	endNode := RxmMarker new index: self allocateMarker.
+ 	
+ 	regexNode key ifNotNil: [:key |
+ 		self registerMarkerKey: key from: startNode index to: endNode index].
+ 	
- 	| startIndex endIndex endNode alternatives |
- 	startIndex := self allocateMarker.
- 	endIndex := self allocateMarker.
- 	endNode := RxmMarker new index: endIndex.
  	alternatives := self hookBranchOf: regexNode onto: endNode.
+ 	^ startNode
- 	^(RxmMarker new index: startIndex)
  		pointTailTo: alternatives;
  		yourself!

Item was changed:
  ----- Method: RxMatcher>>tryMatch (in category 'private') -----
  tryMatch
  	"Match thyself against the current stream."
  
  	| wasFirstTryMatch |
  	wasFirstTryMatch := firstTryMatch.
+ 	self resetMatcherPositions.
- 	self resetMarkerPositions.
  	lastResult := startOptimizer
  		ifNil: [ matcher matchAgainst: self ]
  		ifNotNil: [ (startOptimizer canStartMatch: stream peek in: self) and: [ matcher matchAgainst: self ] ].
  	"check for duplicates"
  	lastResult ifFalse: [ ^false ].
  	wasFirstTryMatch ifTrue: [ ^true ].
+ 	(previousMarkerPositions hasEqualElements: markerPositions)
+ 		flag: #suspicious; "ct: do we need to check uncaptured groups here, too?"
+ 		ifFalse: [ ^true ].
- 	(previousMarkerPositions hasEqualElements: markerPositions) ifFalse: [ ^true ].
  	"this is a duplicate match"
  	^ lastResult := false!

Item was changed:
+ RxAbstractParser subclass: #RxParser
+ 	instanceVariableNames: ''
+ 	classVariableNames: 'BackslashConditions'
- Object subclass: #RxParser
- 	instanceVariableNames: 'input lookahead'
- 	classVariableNames: 'BackslashConstants BackslashSpecials'
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
+ !RxParser commentStamp: 'ct 10/28/2021 02:59' prior: 0!
- !RxParser commentStamp: 'Tbn 11/12/2010 23:13' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
+ The regular expression parser. Translates a regular expression read from a stream into a parse tree. ('accessing' protocol). The tree can later be passed to a matcher initialization method.  All other classes in this category implement the tree. Refer to their comments for any details.!
- The regular expression parser. Translates a regular expression read from a stream into a parse tree. ('accessing' protocol). The tree can later be passed to a matcher initialization method.  All other classes in this category implement the tree. Refer to their comments for any details.
- 
- Instance variables:
- 	input		<Stream> A stream with the regular expression being parsed.
- 	lookahead	<Character>!

Item was added:
+ ----- Method: RxParser class>>escapeString: (in category 'utilities') -----
+ escapeString: aString
+ 	"Answer a copy of aString which does not contain any unescaped characters. This is the inverse function of String >> #matchesRegex:.
+ 	NB: Basically, we could simply escape every single character in the string, but this would not produce human-readable outputs."
+ 
+ 	| special lastIndex nextIndex |
+ 	special := self specialCharacters.
+ 	nextIndex := aString indexOfAnyOf: special startingAt: (lastIndex := 1) ifAbsent: [^ aString].
+ 	^ String new: aString size * 11 // 10 "+10%" streamContents: [:stream |
+ 		[stream
+ 			next: nextIndex - lastIndex putAll: aString startingAt: lastIndex;
+ 			nextPut: $\;
+ 			nextPut: (aString at: nextIndex)]
+ 				doWhileTrue: [(nextIndex := aString indexOfAnyOf: special startingAt: (lastIndex := nextIndex + 1)) > 0].
+ 		stream next: aString size - lastIndex + 1 putAll: aString startingAt: lastIndex]!

Item was changed:
  ----- Method: RxParser class>>initialize (in category 'class initialization') -----
  initialize
  	"self initialize"
+ 	self initializeBackslashConditions!
- 	self
- 		initializeBackslashConstants;
- 		initializeBackslashSpecials!

Item was added:
+ ----- Method: RxParser class>>initializeBackslashConditions (in category 'class initialization') -----
+ initializeBackslashConditions
+ 	"The keys are characters that normally follow a $\, the values are either associations of classes and initialization selectors on their instance side, or evaluables that will be evaluated on the current parser instance."
+ 	"self initializeBackslashConditions"
+ 
+ 	(BackslashConditions := Dictionary new)
+ 		at: $b put: RxsContextCondition -> #beWordBoundary;
+ 		at: $B put: RxsContextCondition -> #beNonWordBoundary;
+ 		at: $< put: RxsContextCondition -> #beBeginningOfWord;
+ 		at: $> put: RxsContextCondition -> #beEndOfWord.!

Item was removed:
- ----- Method: RxParser class>>initializeBackslashConstants (in category 'class initialization') -----
- initializeBackslashConstants
- 	"self initializeBackslashConstants"
- 
- 	(BackslashConstants := Dictionary new)
- 		at: $e put: Character escape;
- 		at: $n put: Character lf;
- 		at: $r put: Character cr;
- 		at: $f put: Character newPage;
- 		at: $t put: Character tab!

Item was removed:
- ----- Method: RxParser class>>initializeBackslashSpecials (in category 'class initialization') -----
- initializeBackslashSpecials
- 	"Keys are characters that normally follow a \, the values are
- 	associations of classes and initialization selectors on the instance side
- 	of the classes."
- 	"self initializeBackslashSpecials"
- 
- 	(BackslashSpecials := Dictionary new)
- 		at: $w put: (Association key: RxsPredicate value: #beWordConstituent);
- 		at: $W put: (Association key: RxsPredicate value: #beNotWordConstituent);
- 		at: $s put: (Association key: RxsPredicate value: #beSpace);
- 		at: $S put: (Association key: RxsPredicate value: #beNotSpace);
- 		at: $d put: (Association key: RxsPredicate value: #beDigit);
- 		at: $D put: (Association key: RxsPredicate value: #beNotDigit);
- 		at: $b put: (Association key: RxsContextCondition value: #beWordBoundary);
- 		at: $B put: (Association key: RxsContextCondition value: #beNonWordBoundary);
- 		at: $< put: (Association key: RxsContextCondition value: #beBeginningOfWord);
- 		at: $> put: (Association key: RxsContextCondition value: #beEndOfWord)!

Item was removed:
- ----- Method: RxParser class>>signalSyntaxException: (in category 'exception signaling') -----
- signalSyntaxException: errorString
- 	RegexSyntaxError new signal: errorString!

Item was removed:
- ----- Method: RxParser class>>signalSyntaxException:at: (in category 'exception signaling') -----
- signalSyntaxException: errorString at: errorPosition
- 	RegexSyntaxError signal: errorString at: errorPosition!

Item was added:
+ ----- Method: RxParser class>>specialCharacters (in category 'utilities') -----
+ specialCharacters
+ 
+ 	^ '()[]*+?{}.^$:\'!

Item was changed:
  ----- Method: RxParser>>atom (in category 'recursive descent') -----
  atom
  	"An atom is one of a lot of possibilities, see below."
  
  	| atom |
  	(lookahead == nil 
+ 	or: [ lookahead == $| ] 
+ 	or: [ lookahead == $) ])
- 	or: [ lookahead == $| 
- 	or: [ lookahead == $)
- 	or: [ lookahead == $*
- 	or: [ lookahead == $+ 
- 	or: [ lookahead == $? ]]]]])
  		ifTrue: [ ^RxsEpsilon new ].
+ 	
+ 	(lookahead == $*
+ 	or: [ lookahead == $+ ]
+ 	or: [ lookahead == $? ]
+ 	or: [ lookahead == ${ ])
+ 		ifTrue: [ ^self signalParseError: 'nothing to repeat' ].
+ 	
- 		
  	lookahead == $( 
  		ifTrue: [
+ 			^ self group ].
- 			"<atom> ::= '(' <regex> ')' "
- 			self match: $(.
- 			atom := self regex.
- 			self match: $).
- 			^atom ].
  	
  	lookahead == $[
  		ifTrue: [
  			"<atom> ::= '[' <characterSet> ']' "
  			self match: $[.
  			atom := self characterSet.
  			self match: $].
  			^atom ].
  	
  	lookahead == $: 
  		ifTrue: [
  			"<atom> ::= ':' <messagePredicate> ':' "
  			self match: $:.
  			atom := self messagePredicate.
  			self match: $:.
  			^atom ].
  	
  	lookahead == $. 
  		ifTrue: [
  			"any non-whitespace character"
  			self next.
  			^RxsContextCondition new beAny].
  	
  	lookahead == $^ 
  		ifTrue: [
  			"beginning of line condition"
  			self next.
  			^RxsContextCondition new beBeginningOfLine].
  	
  	lookahead == $$ 
  		ifTrue: [
  			"end of line condition"
  			self next.
  			^RxsContextCondition new beEndOfLine].
  		
  	lookahead == $\ 
  		ifTrue: [
+ 			"<atom> ::= '\' <node>"
+ 			self match: $\.
+ 			^self backslashNode].
- 			"<atom> ::= '\' <character>"
- 			self next ifNil: [ self signalParseError: 'bad quotation' ].
- 			(BackslashConstants includesKey: lookahead) ifTrue: [
- 				atom := RxsCharacter with: (BackslashConstants at: lookahead).
- 				self next.
- 				^atom].
- 			self ifSpecial: lookahead
- 				then: [:node | self next. ^node]].
  		
  	"If passed through the above, the following is a regular character."
  	atom := RxsCharacter with: lookahead.
  	self next.
  	^atom!

Item was added:
+ ----- Method: RxParser>>backslashCondition (in category 'recursive descent') -----
+ backslashCondition
+ 
+ 	^ self backslashSpecial: BackslashConditions!

Item was added:
+ ----- Method: RxParser>>basicBackslashNode (in category 'recursive descent') -----
+ basicBackslashNode
+ 
+ 	^ super basicBackslashNode ifNil: [self backslashCondition]!

Item was changed:
  ----- Method: RxParser>>characterSet (in category 'recursive descent') -----
  characterSet
  	"Match a range of characters: something between `[' and `]'.
  	Opening bracked has already been seen, and closing should
  	not be consumed as well. Set spec is as usual for
  	sets in regexes."
  
+ 	| start spec errorMessage |
+ 	errorMessage := 'no terminating "]"'.
+ 	start := source position.
- 	| spec errorMessage |
- 	errorMessage := ' no terminating "]"'.
  	spec := self inputUpTo: $] nestedOn: $[ errorMessage: errorMessage.
  	(spec isEmpty 
  	or: [spec = '^']) 
  		ifTrue: [
  			"This ']' was literal." 
  			self next.
  			spec := spec, ']', (self inputUpTo: $] nestedOn: $[ errorMessage: errorMessage)].
+ 	^self class
+ 		doShiftingSyntaxExceptionPositions: [self characterSetFrom: spec]
+ 		from: start!
- 	^self characterSetFrom: spec!

Item was added:
+ ----- Method: RxParser>>group (in category 'recursive descent') -----
+ group
+ 
+ 	self match: $(.
+ 	lookahead == $?
+ 		ifFalse: [
+ 			| group |
+ 			"<group> ::= '(' <regex> ')' "
+ 			group := self regex.
+ 			self match: $).
+ 			^ group].
+ 	
+ 	self next.
+ 	
+ 	lookahead == $:
+ 		ifTrue: [
+ 			"non-capturing group"
+ 			"<group> ::= '(?:' <regex> ')' "
+ 			| group |
+ 			self next.
+ 			group := self regex.
+ 			group beNonCapturing.
+ 			self match: $).
+ 			^ group].
+ 	
+ 	(lookahead == $' or: [lookahead == $<
+ 		and: [source atEnd not] and: [source peek isAlphaNumeric]])
+ 		ifTrue: [
+ 			"named capture group"
+ 			"<group ::= '(?<' <key> '>' <regex> ') | '(?''' <key> '''' <regex> ')' "
+ 			| closing key group |
+ 			closing := #($'$>) at: (#($'$<) indexOf: lookahead).
+ 			self next.
+ 			key := self
+ 				inputWhile: #isAlphaNumeric
+ 				errorMessage: 'invalid capture group name' translated.
+ 			self match: closing.
+ 			group := self regex.
+ 			group key: key.
+ 			self match: $).
+ 			^ group].
+ 	
+ 	('<=!!' includes: lookahead)
+ 		ifTrue: [
+ 			| lookaround |
+ 			lookaround := self lookAround.
+ 			self match: $).
+ 			^ lookaround ].
+ 	
+ 	^ self signalParseError!

Item was removed:
- ----- Method: RxParser>>ifSpecial:then: (in category 'private') -----
- ifSpecial: aCharacter then: aBlock
- 	"If the character is such that it defines a special node when follows a $\,
- 	then create that node and evaluate aBlock with the node as the parameter.
- 	Otherwise just return."
- 
- 	| classAndSelector |
- 	classAndSelector := BackslashSpecials at: aCharacter ifAbsent: [^self].
- 	^aBlock value: (classAndSelector key new perform: classAndSelector value)!

Item was removed:
- ----- Method: RxParser>>inputUpTo:errorMessage: (in category 'private') -----
- inputUpTo: aCharacter errorMessage: aString
- 	"Accumulate input stream until <aCharacter> is encountered
- 	and answer the accumulated chars as String, not including
- 	<aCharacter>. Signal error if end of stream is encountered,
- 	passing <aString> as the error description."
- 
- 	| accumulator |
- 	accumulator := WriteStream on: (String new: 20).
- 	[ lookahead == aCharacter or: [lookahead == nil ] ]
- 		whileFalse: [
- 			accumulator nextPut: lookahead.
- 			self next].
- 	lookahead ifNil: [ self signalParseError: aString ].
- 	^accumulator contents!

Item was removed:
- ----- Method: RxParser>>inputUpTo:nestedOn:errorMessage: (in category 'private') -----
- inputUpTo: aCharacter nestedOn: anotherCharacter errorMessage: aString 
- 	"Accumulate input stream until <aCharacter> is encountered
- 	and answer the accumulated chars as String, not including
- 	<aCharacter>. Signal error if end of stream is encountered,
- 	passing <aString> as the error description."
- 
- 	| accumulator nestLevel |
- 	accumulator := WriteStream on: (String new: 20).
- 	nestLevel := 0.
- 	[ lookahead == aCharacter and: [ nestLevel = 0 ] ] whileFalse: [
- 		lookahead ifNil: [ self signalParseError: aString ].
- 		lookahead == $\
- 			ifTrue: [ 
- 				self next ifNil: [ self signalParseError: aString ].
- 				BackslashConstants
- 					at: lookahead
- 					ifPresent: [ :unescapedCharacter | accumulator nextPut: unescapedCharacter ]
- 					ifAbsent: [
- 						accumulator
- 							nextPut: $\;
- 							nextPut: lookahead ] ]
- 			ifFalse: [
- 				accumulator nextPut: lookahead.
- 				lookahead == anotherCharacter ifTrue: [ nestLevel := nestLevel + 1 ].
- 				lookahead == aCharacter ifTrue: [ nestLevel := nestLevel - 1 ] ].
- 		self next ].
- 	^accumulator contents!

Item was removed:
- ----- Method: RxParser>>inputUpToAny:errorMessage: (in category 'private') -----
- inputUpToAny: aDelimiterString errorMessage: aString
- 	"Accumulate input stream until any character from <aDelimiterString> is encountered
- 	and answer the accumulated chars as String, not including the matched characters from the
- 	<aDelimiterString>. Signal error if end of stream is encountered,
- 	passing <aString> as the error description."
- 
- 	| accumulator |
- 	accumulator := WriteStream on: (String new: 20).
- 	[ lookahead == nil or: [ aDelimiterString includes: lookahead ] ]
- 		whileFalse: [
- 			accumulator nextPut: lookahead.
- 			self next ].
- 	lookahead ifNil: [ self signalParseError: aString ].
- 	^accumulator contents!

Item was changed:
  ----- Method: RxParser>>lookAround (in category 'recursive descent') -----
  lookAround
  	"Parse a lookaround expression after: (?<lookaround>) 
+ 	<lookaround> ::= !!<regex> | =<regex>
+ 	Positive lookahead: ?=
+ 	Negative lookahead: ?!!
+ 	Positive lookbehind: ?<=
+ 	Negative lookbehind: ?<!!"
+ 
- 	<lookaround> ::= !!<regex> | =<regex>"
  	| lookbehind positive |
+ 	('<!!=' includes: lookahead) ifFalse: [
+ 		^ self signalParseError: 'invalid lookaround expression ?', lookahead asString].
+ 	
- 	('!!=<' includes: lookahead) ifFalse: [
- 		^ self signalParseError: 'Invalid lookaround expression ?', lookahead asString].
  	lookbehind := lookahead == $<
  		ifTrue: [self next];
  		yourself.
+ 	
+ 	('!!=' includes: lookahead) ifFalse: [
+ 		^ self signalParseError: 'invalid lookaround expression'].
+ 	
  	positive := lookahead == $=.
  	self next.
  	^ RxsLookaround
  		with: self regex
  		forward: lookbehind not
  		positive: positive!

Item was removed:
- ----- Method: RxParser>>match: (in category 'private') -----
- match: aCharacter
- 	"<aCharacter> MUST match the current lookeahead.
- 	If this is the case, advance the input. Otherwise, blow up."
- 
- 	aCharacter == lookahead ifFalse: [ ^self signalParseError ]. "does not return"
- 	self next!

Item was changed:
  ----- Method: RxParser>>messagePredicate (in category 'recursive descent') -----
  messagePredicate
  	"Match a message predicate specification: a selector (presumably
  	understood by a Character) enclosed in :'s ."
  
  	| spec negated |
+ 	spec := self inputUpTo: $: errorMessage: 'no terminating ":"'.
+ 	spec ifEmpty: [self signalParseError ].
- 	spec := self inputUpTo: $: errorMessage: ' no terminating ":"'.
  	negated := false.
  	spec first = $^ 
  		ifTrue: [
  			negated := true.
  			spec := spec copyFrom: 2 to: spec size].
  	^RxsMessagePredicate new 
  		initializeSelector: spec asSymbol
  		negated: negated!

Item was removed:
- ----- Method: RxParser>>next (in category 'private') -----
- next
- 	"Advance the input storing the just read character
- 	as the lookahead."
- 
- 	^lookahead := input next!

Item was changed:
  ----- Method: RxParser>>parseStream: (in category 'accessing') -----
  parseStream: aStream
  	"Parse an input from a character stream <aStream>.
  	On success, answers an RxsRegex -- parse tree root.
  	On error, raises `RxParser syntaxErrorSignal' with the current
  	input stream position as the parameter."
  
  	| tree |
+ 	self initialize: aStream.
- 	input := aStream.
- 	self next.
  	tree := self regex.
  	self match: nil.
  	^tree!

Item was changed:
  ----- Method: RxParser>>piece (in category 'recursive descent') -----
  piece
  	"<piece> ::= <atom> | <atom>* | <atom>+ | <atom>? | <atom>{<number>,<number>}"
  
  	| atom |
  	atom := self atom.
  	
  	lookahead == $*
  		ifTrue: [ 
  			self next.
- 			atom isNullable
- 				ifTrue: [ self signalNullableClosureParserError ].
  			^ RxsPiece new initializeStarAtom: atom ].
  
  	lookahead == $+
  		ifTrue: [ 
  			self next.
- 			atom isNullable
- 				ifTrue: [ self signalNullableClosureParserError ].
  			^ RxsPiece new initializePlusAtom: atom ].
  
  	lookahead == $?
  		ifTrue: [ 
  			self next.
- 			atom isNullable
- 				ifTrue: [ 
- 					^ self lookAround ].
  			^ RxsPiece new initializeOptionalAtom: atom ].
  	
  	lookahead == ${
  		ifTrue: [
  			^ self quantifiedAtom: atom ].
  		
  	^ RxsPiece new initializeAtom: atom!

Item was changed:
  ----- Method: RxParser>>quantifiedAtom: (in category 'recursive descent') -----
  quantifiedAtom: atom
  	"Parse a quanitifer expression which can have one of the following forms
  		{<min>,<max>}    match <min> to <max> occurences
  		{<minmax>}       which is the same as with repeated limits: {<number>,<number>}
  		{<min>,}         match at least <min> occurences
  		{,<max>}         match maximally <max> occurences, which is the same as {0,<max>}"
  	| min max |
  	self next.
  	lookahead == $,
  		ifTrue: [ min := 0 ]
  		ifFalse: [
+ 			max := min := (self inputUpToAny: ',}' errorMessage: 'no terminating "}"') asUnsignedInteger ].
- 			max := min := (self inputUpToAny: ',}' errorMessage: ' no terminating "}"') asUnsignedInteger ].
  	lookahead == $,
  		ifTrue: [
  			self next.
+ 			max := (self inputUpToAny: ',}' errorMessage: 'no terminating "}"') asUnsignedInteger ].	
- 			max := (self inputUpToAny: ',}' errorMessage: ' no terminating "}"') asUnsignedInteger ].	
  	self match: $}.
- 	atom isNullable
- 		ifTrue: [ self signalNullableClosureParserError ].
  	(max notNil and: [ max < min ])
  		ifTrue: [ self signalParseError: ('wrong quantifier, expected ', min asString, ' <= ', max asString) ].
  	^ RxsPiece new 
  		initializeAtom: atom
  		min: min
  		max: max!

Item was removed:
- ----- Method: RxParser>>signalNullableClosureParserError (in category 'private') -----
- signalNullableClosureParserError
- 	self signalParseError: ' nullable closure'.!

Item was removed:
- ----- Method: RxParser>>signalParseError (in category 'private') -----
- signalParseError
- 
- 	self class 
- 		signalSyntaxException: 'Regex syntax error' at: input position!

Item was removed:
- ----- Method: RxParser>>signalParseError: (in category 'private') -----
- signalParseError: aString
- 
- 	self class signalSyntaxException: aString at: input position!

Item was changed:
  ----- Method: RxmBranch>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  	"Match either `next' or `alternative'. Fail if the alternative is nil."
  
+ 	^ aMatcher matchAgainstBranch: self nextLink: next alternativeLink: alternative!
- 	(next matchAgainst: aMatcher) ifTrue: [ ^true ].
- 	^(alternative ifNil: [ ^false ]) matchAgainst: aMatcher!

Item was changed:
  ----- Method: RxmLink>>copyUsing: (in category 'copying') -----
  copyUsing: anIdentityDictionary
  	"Copy the receiver if it's not present in the argument dictionary, or just return the previously made copy. The rest of the object graph will be copied by #postCopyUsing:."
  
  	^anIdentityDictionary 
  		at: self
  		ifAbsent: [
  			"It may be tempting to use #at:ifAbsentPut: instead, but the argument block must not modify the receiver, so that wouldn't work."
+ 			(anIdentityDictionary
- 			anIdentityDictionary
  				at: self
+ 				put: self shallowCopy)
- 				put: (self shallowCopy
  					postCopyUsing: anIdentityDictionary;
+ 					yourself ]!
- 					yourself) ]!

Item was changed:
  ----- Method: RxmLookaround>>terminateWith: (in category 'building') -----
  terminateWith: aNode
+ 	"Documentation only. The inner lookaround piece is terminated separately during the construction of the receiver. Lookaround links may not receive #terminateWith: if they are located in a loopback branch, but the inner piece does *not* contain a path to the next link of the receiver."
+ 
+ 	super terminateWith: aNode!
- 	lookaround terminateWith: aNode.
- 	super terminateWith: aNode.!

Item was added:
+ ----- Method: RxmMarker>>index (in category 'accessing') -----
+ index
+ 
+ 	^ index!

Item was added:
+ ----- Method: RxsCharSet>>basicMaximumCharacterCodeIgnoringCase: (in category 'accessing') -----
+ basicMaximumCharacterCodeIgnoringCase: aBoolean
+ 
+ 	^ elements inject: -1 into: [ :max :each |
+ 		(each maximumCharacterCodeIgnoringCase: aBoolean) max: max ]!

Item was changed:
  ----- Method: RxsCharSet>>enumerableSetIgnoringCase: (in category 'privileged') -----
  enumerableSetIgnoringCase: aBoolean
  	"Answer a collection of characters that make up the portion of me that can be enumerated, or nil if there are no such characters. The case check is only used to determine the type of set to be used. The returned set won't contain characters of both cases, because this way the senders of this method can create more efficient checks."
  
  	| highestCharacterCode set |
+ 	highestCharacterCode := self basicMaximumCharacterCodeIgnoringCase: aBoolean.
- 	highestCharacterCode := elements inject: -1 into: [ :max :each |
- 		(each maximumCharacterCodeIgnoringCase: aBoolean) max: max ].
  	highestCharacterCode = -1 ifTrue: [ ^nil ].
  	set := highestCharacterCode <= 255
  		ifTrue: [ CharacterSet new ]
  		ifFalse: [ WideCharacterSet new ].
  	elements do: [ :each | each enumerateTo: set ].
  	^set!

Item was added:
+ ----- Method: RxsCharSet>>enumerateTo: (in category 'accessing') -----
+ enumerateTo: aSet
+ 
+ 	negated ifTrue: [^ self "Not enumerable"].
+ 	^ elements do: [:each | each enumerateTo: aSet]!

Item was changed:
  ----- Method: RxsCharSet>>isEnumerable (in category 'testing') -----
  isEnumerable
  
+ 	negated ifTrue: [^ false].
  	^elements anySatisfy: [:some | some isEnumerable ]!

Item was added:
+ ----- Method: RxsCharSet>>maximumCharacterCodeIgnoringCase: (in category 'accessing') -----
+ maximumCharacterCodeIgnoringCase: aBoolean
+ 	"Answer the largest character code among the characters I represent."
+ 
+ 	negated ifTrue: [^ -1 "not enumerable"].
+ 	^ self basicMaximumCharacterCodeIgnoringCase: aBoolean!

Item was added:
+ ----- Method: RxsCharSet>>negated (in category 'converting') -----
+ negated
+ 
+ 	^ self class new
+ 		initializeElements: elements
+ 		negated: negated not!

Item was changed:
  ----- Method: RxsCharSet>>predicateIgnoringCase: (in category 'accessing') -----
  predicateIgnoringCase: aBoolean
  
  	| enumerable predicate |
  	enumerable := self enumerablePartPredicateIgnoringCase: aBoolean.
+ 	predicate := (self predicatePartPredicateIgnoringCase: aBoolean) ifNil: [ 
- 	predicate := self predicatePartPredicate ifNil: [ 
  		"There are no predicates in this set."
  		^enumerable ifNil: [ 
  			"This set is empty."
  			[ :char | negated ] ] ].
  	enumerable ifNil: [ ^predicate ].
  	negated ifTrue: [
  		"enumerable and predicate already negate the result, that's why #not is not needed here."
  		^[ :char | (enumerable value: char) and: [ predicate value: char ] ] ].
  	^[ :char | (enumerable value: char) or: [ predicate value: char ] ]!

Item was removed:
- ----- Method: RxsCharSet>>predicatePartPredicate (in category 'privileged') -----
- predicatePartPredicate
- 	"Answer a predicate that tests all of my elements that cannot be enumerated, or nil if such elements don't exist."
- 
- 	| predicates size |
- 	predicates := elements reject: [ :some | some isEnumerable ].
- 	(size := predicates size) = 0 ifTrue: [ 
- 		"We could return a real predicate block - like [ :char | negated ] - here, but it wouldn't be used anyway. This way we signal that this character set has no predicates."
- 		^nil ].
- 	size = 1 ifTrue: [
- 		negated ifTrue: [ ^predicates first predicateNegation ].
- 		^predicates first predicate ].
- 	predicates replace: [ :each | each predicate ].
- 	negated ifTrue: [ ^[ [: char | predicates noneSatisfy: [ :some | some value: char ] ] ] ].
- 	^[ :char | predicates anySatisfy: [ :some | some value: char ] ]
- 	!

Item was added:
+ ----- Method: RxsCharSet>>predicatePartPredicateIgnoringCase: (in category 'privileged') -----
+ predicatePartPredicateIgnoringCase: aBoolean
+ 	"Answer a predicate that tests all of my elements that cannot be enumerated, or nil if such elements don't exist."
+ 
+ 	| predicates size |
+ 	predicates := elements reject: [ :some | some isEnumerable ].
+ 	(size := predicates size) = 0 ifTrue: [ 
+ 		"We could return a real predicate block - like [ :char | negated ] - here, but it wouldn't be used anyway. This way we signal that this character set has no predicates."
+ 		^nil ].
+ 	size = 1 ifTrue: [
+ 		negated ifTrue: [ ^predicates first predicateNegationIgnoringCase: aBoolean ].
+ 		^predicates first predicateIgnoringCase: aBoolean ].
+ 	predicates replace: [ :each | each predicateIgnoringCase: aBoolean ].
+ 	negated ifTrue: [ ^[: char | predicates noneSatisfy: [ :some | some value: char ] ] ].
+ 	^[ :char | predicates anySatisfy: [ :some | some value: char ] ]!

Item was removed:
- ----- Method: RxsCharSet>>predicates (in category 'accessing') -----
- predicates
- 
- 	| predicates |
- 	predicates := elements reject: [ :some | some isEnumerable ].
- 	predicates isEmpty ifTrue: [ ^nil ].
- 	^predicates replace: [ :each | each predicate ]!

Item was added:
+ ----- Method: RxsCharSet>>predicatesIgnoringCase: (in category 'accessing') -----
+ predicatesIgnoringCase: aBoolean
+ 
+ 	| predicates |
+ 	predicates := elements reject: [ :some | some isEnumerable ].
+ 	predicates isEmpty ifTrue: [ ^nil ].
+ 	^predicates replace: [ :each | each predicateIgnoringCase: aBoolean ]!

Item was added:
+ ----- Method: RxsCharacter>>isRegexCharacter (in category 'testing') -----
+ isRegexCharacter
+ 
+ 	^ true!

Item was changed:
  ----- Method: RxsCharacter>>maximumCharacterCodeIgnoringCase: (in category 'accessing') -----
  maximumCharacterCodeIgnoringCase: aBoolean
+ 	"Answer the largest character code among the characters I represent."
- 	"Return the largest character code among the characters I represent."
  
  	aBoolean ifFalse: [ ^character asInteger ].
  	^character asUppercase asInteger max: character asLowercase asInteger!

Item was added:
+ ----- Method: RxsCharacter>>negated (in category 'converting') -----
+ negated
+ 
+ 	^ RxsCharSet new
+ 		initializeElements: {self}
+ 		negated: true!

Item was changed:
  ----- Method: RxsLookaround>>initializePiece:forward:positive: (in category 'initialize-release') -----
  initializePiece: anRsxPiece forward: forwardBoolean positive: positiveBoolean
  
  	piece := anRsxPiece.
+ 	piece beNonCapturing.
  	forward := forwardBoolean.
  	positive := positiveBoolean.!

Item was removed:
- ----- Method: RxsNode>>indentCharacter (in category 'constants') -----
- indentCharacter
- 	"Normally, #printOn:withIndent: method in subclasses
- 	print several characters returned by this method to indicate
- 	the tree structure."
- 
- 	^$+!

Item was added:
+ ----- Method: RxsNode>>isRegexCharacter (in category 'testing') -----
+ isRegexCharacter
+ 
+ 	^ false!

Item was changed:
  RxsNode subclass: #RxsPredicate
  	instanceVariableNames: 'predicate negation'
+ 	classVariableNames: 'NamedClassSelectors'
- 	classVariableNames: 'EscapedLetterSelectors NamedClassSelectors'
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
  !RxsPredicate commentStamp: 'Tbn 11/12/2010 23:15' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  This represents a character that satisfies a certain predicate.
  
  Instance Variables:
  
  	predicate	<BlockClosure>	A one-argument block. If it evaluates to the value defined by <negated> when it is passed a character, the predicate is considered to match.
  	negation	<BlockClosure>	A one-argument block that is a negation of <predicate>.!

Item was removed:
- ----- Method: RxsPredicate class>>forEscapedLetter: (in category 'instance creation') -----
- forEscapedLetter: aCharacter
- 	"Return a predicate instance for the given character, or nil if there's no such predicate."
- 
- 	^EscapedLetterSelectors
- 		at: aCharacter
- 		ifPresent: [ :selector | self new perform: selector ]!

Item was changed:
  ----- Method: RxsPredicate class>>initialize (in category 'class initialization') -----
  initialize
  	"self initialize"
  
+ 	self initializeNamedClassSelectors!
- 	self
- 		initializeNamedClassSelectors;
- 		initializeEscapedLetterSelectors!

Item was removed:
- ----- Method: RxsPredicate class>>initializeEscapedLetterSelectors (in category 'class initialization') -----
- initializeEscapedLetterSelectors
- 	"self initializeEscapedLetterSelectors"
- 
- 	EscapedLetterSelectors := Dictionary new
- 		at: $w put: #beWordConstituent;
- 		at: $W put: #beNotWordConstituent;
- 		at: $d put: #beDigit;
- 		at: $D put: #beNotDigit;
- 		at: $s put: #beSpace;
- 		at: $S put: #beNotSpace;
- 		yourself!

Item was added:
+ ----- Method: RxsPredicate class>>isValidUnicodeCategory: (in category 'support') -----
+ isValidUnicodeCategory: categoryName
+ 
+ 	^ (self unicodeClass allCategoryTags anySatisfy: [:tag | tag beginsWith: categoryName])!

Item was added:
+ ----- Method: RxsPredicate class>>supportsUnicode (in category 'support') -----
+ supportsUnicode
+ 
+ 	^ self unicodeClass notNil!

Item was added:
+ ----- Method: RxsPredicate class>>unicodeClass (in category 'support') -----
+ unicodeClass
+ 
+ 	^ Smalltalk classNamed: #Unicode!

Item was added:
+ ----- Method: RxsPredicate>>beUnicodeCategory: (in category 'initialize-release') -----
+ beUnicodeCategory: categoryName
+ 
+ 	| unicodeClass |
+ 	unicodeClass := self class unicodeClass.
+ 	self predicate: [:char |
+ 		(unicodeClass generalCategoryTagOf: char asUnicode) beginsWith: categoryName].!

Item was changed:
  ----- Method: RxsPredicate>>maximumCharacterCodeIgnoringCase: (in category 'accessing') -----
  maximumCharacterCodeIgnoringCase: aBoolean
+ 	"Answer the largest character code among the characters I represent."
- 	"Return the largest character code among the characters I represent."
  
  	^-1 "Not enumerable"!

Item was removed:
- ----- Method: RxsPredicate>>predicate (in category 'accessing') -----
- predicate
- 
- 	^predicate!

Item was added:
+ ----- Method: RxsPredicate>>predicate: (in category 'initialize-release') -----
+ predicate: aBlock
+ 
+ 	predicate := aBlock.
+ 	negation := [:char | (predicate value: char) not].!

Item was added:
+ ----- Method: RxsPredicate>>predicateIgnoringCase: (in category 'accessing') -----
+ predicateIgnoringCase: aBoolean
+ 
+ 	^predicate!

Item was removed:
- ----- Method: RxsPredicate>>predicateNegation (in category 'accessing') -----
- predicateNegation
- 
- 	^negation!

Item was added:
+ ----- Method: RxsPredicate>>predicateNegationIgnoringCase: (in category 'accessing') -----
+ predicateNegationIgnoringCase: aBoolean
+ 
+ 	^negation!

Item was changed:
  ----- Method: RxsRange>>maximumCharacterCodeIgnoringCase: (in category 'accessing') -----
  maximumCharacterCodeIgnoringCase: aBoolean
+ 	"Answer the largest character code among the characters I represent."
- 	"Return the largest character code among the characters I represent."
  
  	first <= last ifFalse: [ ^-1 "Empty range" ].
  	aBoolean ifFalse: [ ^last asInteger ].
  	^(first to: last) detectMax: [ :each |
  		each asLowercase asInteger max: each asUppercase asInteger ]
  	!

Item was changed:
  RxsNode subclass: #RxsRegex
+ 	instanceVariableNames: 'branch regex isCapturing key'
- 	instanceVariableNames: 'branch regex'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
+ !RxsRegex commentStamp: 'ct 10/5/2022 12:38' prior: 0!
- !RxsRegex commentStamp: 'Tbn 11/12/2010 23:15' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  The body of a parenthesized thing, or a top-level expression, also an atom.  
  
  Instance variables:
  	branch		<RxsBranch>
+ 	regex		<RxsRegex | RxsEpsilon>
+ 	isCapturing	<Boolean>
+ 	key			<Object>!
- 	regex		<RxsRegex | RxsEpsilon>!

Item was added:
+ ----- Method: RxsRegex>>beNonCapturing (in category 'accessing') -----
+ beNonCapturing
+ 
+ 	isCapturing := false.!

Item was added:
+ ----- Method: RxsRegex>>initialize (in category 'initialize-release') -----
+ initialize
+ 
+ 	super initialize.
+ 	isCapturing := true.!

Item was added:
+ ----- Method: RxsRegex>>isCapturing (in category 'accessing') -----
+ isCapturing
+ 
+ 	^ isCapturing!

Item was added:
+ ----- Method: RxsRegex>>key (in category 'accessing') -----
+ key
+ 
+ 	^ key!

Item was added:
+ ----- Method: RxsRegex>>key: (in category 'accessing') -----
+ key: anObject
+ 
+ 	key := anObject!

Item was added:
+ ----- Method: String>>escapeForRegex (in category '*Regex-Core') -----
+ escapeForRegex
+ 
+ 	^ RxParser escapeString: self!

Item was removed:
- (PackageInfo named: 'Regex-Core') postscript: 'RxsPredicate initializeEscapedLetterSelectors.'!



More information about the Squeak-dev mailing list