[squeak-dev] The Inbox: Regex-Core-ct.56.mcz

commits at source.squeak.org commits at source.squeak.org
Fri Mar 6 19:32:38 UTC 2020


Christoph Thiede uploaded a new version of Regex-Core to project The Inbox:
http://source.squeak.org/inbox/Regex-Core-ct.56.mcz

==================== Summary ====================

Name: Regex-Core-ct.56
Author: ct
Time: 6 March 2020, 8:32:34.316886 pm
UUID: 22955049-895d-ca43-8919-1d9f44b851f9
Ancestors: Regex-Core-ct.55

Implements lookbehinds (both positive and negative) in Regular Expressions for Squeak.

'(?<=(?<!n''t\s+)l[o]ve\s+)\w+' asRegex matchesIn: 'I love Squeak. I don''t love C++. I love Smalltalk.'.

- Honor lookbehind syntax in RxParser (see #lookAround)
- Add #forward argument to nodes/links messages. Extend RxsLookaround state by #forward boolean.
- Create RxmLookahead as universal class for both lookahead and lookbehind links. Remove RxmLookahead.
- Implement actual lookbehind logic in RxMatcher >> #matchAgainstLookbehind:positive:nextLink:.

Again, I decided not to give further support for the #forward-less versions of the relevant messages, nor for the class RxmLookahead. First, I'm not sure whether "lookahead" is a universally useful default assumption for "lookaround". Also, these selectors have been introduced just a few hours ago in Regex-Core-ct.55, so there should not be a high demand for backward compatibility. Second, all these things are hidden behind the RxMatcher facade, so foreign clients should not really depend on them. (Bad pun: Don't look-behind the facade ...) Opinions on this topic are highly appreciated.

This commit depends indeed on Regex-Core-ct.55. Please review carefully! The lookbehind matching implementation uses an intuitive algorithm. I give absolutely no guarantee that this is an efficient implementation, but for the beginning, even a superpolynomial complexity should be better than no implementation at all, shouldn't it? :-) Further information about lookbehinds can be found here: https://www.regular-expressions.info/lookaround.html

=============== Diff against Regex-Core-ct.55 ===============

Item was added:
+ ----- Method: RxMatchOptimizer>>syntaxLookaround:forward:positive: (in category 'double dispatch') -----
+ syntaxLookaround: lookaroundNode forward: forward positive: positive
+ 	"Do nothing."!

Item was removed:
- ----- Method: RxMatchOptimizer>>syntaxLookaround:positive: (in category 'double dispatch') -----
- syntaxLookaround: lookaroundNode positive: positive
- 	"Do nothing."!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstLookbehind:positive:nextLink: (in category 'matching') -----
+ matchAgainstLookbehind: lookbehind positive: positive nextLink: anRmxLink
+ 
+ 	| position matchesLookbehind |
+ 	position := stream position.
+ 	matchesLookbehind := (position to: 0 by: -1)
+ 		anySatisfy: [:index |
+ 			stream position: index.
+ 			(lookbehind matchAgainst: self)
+ 				and: [stream position = position]].
+ 	matchesLookbehind = positive
+ 		ifFalse: [^ false].
+ 	stream position: position.
+ 	^ anRmxLink matchAgainst: self!

Item was added:
+ ----- Method: RxMatcher>>syntaxLookaround:forward:positive: (in category 'double dispatch') -----
+ syntaxLookaround: lookaroundNode forward: forwardBoolean positive: positiveBoolean
+ 	"Double dispatch from the syntax tree.
+ 	Special link can handle lookarounds (look ahead and look behind, positive and negative)."
+ 	| piece |
+ 	piece := lookaroundNode piece dispatchTo: self.
+ 	^ RxmLookaround with: piece forward: forwardBoolean positive: positiveBoolean!

Item was removed:
- ----- Method: RxMatcher>>syntaxLookaround:positive: (in category 'double dispatch') -----
- syntaxLookaround: lookaroundNode positive: positiveBoolean
- 	"Double dispatch from the syntax tree.
- 	Special link can handle lookarounds (look ahead, positive and negative)."
- 	| piece |
- 	piece := lookaroundNode piece dispatchTo: self.
- 	^ RxmLookahead with: piece positive: positiveBoolean!

Item was changed:
  ----- Method: RxParser>>lookAround (in category 'recursive descent') -----
  lookAround
  	"Parse a lookaround expression after: (?<lookaround>) 
  	<lookaround> ::= !!<regex> | =<regex>"
+ 	| lookbehind positive |
+ 	('!!=<' includes: lookahead) ifFalse: [
- 	| positive |
- 	('!!=' includes: lookahead) ifFalse: [
  		^ self signalParseError: 'Invalid lookaround expression ?', lookahead asString].
+ 	lookbehind := lookahead == $<
+ 		ifTrue: [self next];
+ 		yourself.
  	positive := lookahead == $=.
  	self next.
  	^ RxsLookaround
  		with: self regex
+ 		forward: lookbehind not
  		positive: positive!

Item was removed:
- RxmLink subclass: #RxmLookahead
- 	instanceVariableNames: 'lookahead positive'
- 	classVariableNames: ''
- 	poolDictionaries: ''
- 	category: 'Regex-Core'!
- 
- !RxmLookahead commentStamp: 'ct 3/6/2020 18:29' prior: 0!
- Instance holds onto a lookahead which matches but does not consume anything.
- 
- Instance Variables
- 	lookahead:		<RxmLink>
- 	positive:		<Boolean>
- !

Item was removed:
- ----- Method: RxmLookahead class>>with:positive: (in category 'instance creation') -----
- with: aPiece positive: aBoolean
- 
- 	^self new lookahead: aPiece positive: aBoolean!

Item was removed:
- ----- Method: RxmLookahead>>lookahead:positive: (in category 'accessing') -----
- lookahead: anRxmLink positive: aBoolean
- 	lookahead := anRxmLink.
- 	positive := aBoolean.!

Item was removed:
- ----- Method: RxmLookahead>>matchAgainst: (in category 'matching') -----
- matchAgainst: aMatcher
- 	"Match if the predicate block evaluates to true when given the
- 	current stream character as the argument."
- 
- 	^aMatcher matchAgainstLookahead: lookahead positive: positive nextLink: next!

Item was removed:
- ----- Method: RxmLookahead>>postCopy (in category 'copying') -----
- postCopy
- 
- 	super postCopy.
- 	lookahead := lookahead copy!

Item was removed:
- ----- Method: RxmLookahead>>postCopyUsing: (in category 'copying') -----
- postCopyUsing: anIdentityDictionary
- 
- 	super postCopyUsing: anIdentityDictionary.
- 	lookahead := lookahead copyUsing: anIdentityDictionary!

Item was removed:
- ----- Method: RxmLookahead>>terminateWith: (in category 'building') -----
- terminateWith: aNode
- 	lookahead terminateWith: aNode.
- 	super terminateWith: aNode.!

Item was added:
+ RxmLink subclass: #RxmLookaround
+ 	instanceVariableNames: 'forward positive lookaround'
+ 	classVariableNames: ''
+ 	poolDictionaries: ''
+ 	category: 'Regex-Core'!
+ 
+ !RxmLookaround commentStamp: 'ct 3/6/2020 19:45' prior: 0!
+ Instance holds onto a lookaround which matches but does not consume anything.
+ 
+ Instance Variables
+ 	lookbehind:		<RxmLink>
+ 	forward:		<Boolean>
+ 	positive:		<Boolean>!

Item was added:
+ ----- Method: RxmLookaround class>>with:forward:positive: (in category 'instance creation') -----
+ with: aPiece forward: forwardBoolean positive: positiveBoolean
+ 
+ 	^self new lookaround: aPiece forward: forwardBoolean positive: positiveBoolean!

Item was added:
+ ----- Method: RxmLookaround>>lookaround:forward:positive: (in category 'accessing') -----
+ lookaround: anRxmLink forward: forwardBoolean positive: positiveBoolean
+ 	lookaround := anRxmLink.
+ 	forward := forwardBoolean.
+ 	positive := positiveBoolean.!

Item was added:
+ ----- Method: RxmLookaround>>matchAgainst: (in category 'matching') -----
+ matchAgainst: aMatcher
+ 	"Match if the predicate block evaluates to true when given the current stream character as the argument."
+ 
+ 	^ forward
+ 		ifTrue: [aMatcher matchAgainstLookahead: lookaround positive: positive nextLink: next]
+ 		ifFalse: [aMatcher matchAgainstLookbehind: lookaround positive: positive nextLink: next]!

Item was added:
+ ----- Method: RxmLookaround>>postCopy (in category 'copying') -----
+ postCopy
+ 
+ 	super postCopy.
+ 	lookaround := lookaround copy!

Item was added:
+ ----- Method: RxmLookaround>>postCopyUsing: (in category 'copying') -----
+ postCopyUsing: anIdentityDictionary
+ 
+ 	super postCopyUsing: anIdentityDictionary.
+ 	lookaround := lookaround copyUsing: anIdentityDictionary!

Item was added:
+ ----- Method: RxmLookaround>>terminateWith: (in category 'building') -----
+ terminateWith: aNode
+ 	lookaround terminateWith: aNode.
+ 	super terminateWith: aNode.!

Item was changed:
  RxsNode subclass: #RxsLookaround
+ 	instanceVariableNames: 'piece forward positive'
- 	instanceVariableNames: 'piece positive'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'Regex-Core'!
  
+ !RxsLookaround commentStamp: 'ct 3/6/2020 18:31' prior: 0!
+ Lookaround is used for lookaheads and lookbehinds. They are used to check if the input matches a certain subexpression without consuming any characters (e.g. not advancing the match position).
- !RxsLookaround commentStamp: '<historical>' prior: 0!
- I lookaround is used for lookaheads and lookbehinds. They are used to check if the input matches a certain subexpression without consuming any characters (e.g. not advancing the match position).
  
  Lookarounds can be positive or negative. If they are positive the condition fails if the subexpression fails, if they are negative it is inverse.!

Item was added:
+ ----- Method: RxsLookaround class>>with:forward:positive: (in category 'instance creation') -----
+ with: aRxsRegex forward: forwardBoolean positive: positiveBoolean
+ 	^ self new
+ 		initializePiece: aRxsRegex
+ 		forward: forwardBoolean
+ 		positive: positiveBoolean!

Item was removed:
- ----- Method: RxsLookaround class>>with:positive: (in category 'instance creation') -----
- with: aRxsRegex positive: positiveBoolean
- 	^ self new
- 		initializePiece: aRxsRegex
- 		positive: positiveBoolean!

Item was added:
+ ----- Method: RxsLookaround>>beLookahead (in category 'initialize-release') -----
+ beLookahead
+ 	forward := true!

Item was added:
+ ----- Method: RxsLookaround>>beLookbehind (in category 'initialize-release') -----
+ beLookbehind
+ 	forward := false!

Item was changed:
  ----- Method: RxsLookaround>>dispatchTo: (in category 'accessing') -----
  dispatchTo: aBuilder
  	"Inform the matcher of the kind of the node, and it will do whatever it has to."
+ 	^aBuilder syntaxLookaround: self forward: self forward positive: self positive!
- 	^aBuilder syntaxLookaround: self positive: self positive!

Item was added:
+ ----- Method: RxsLookaround>>forward (in category 'accessing') -----
+ forward
+ 
+ 	^ forward!

Item was added:
+ ----- Method: RxsLookaround>>initializePiece:forward:positive: (in category 'initialize-release') -----
+ initializePiece: anRsxPiece forward: forwardBoolean positive: positiveBoolean
+ 
+ 	piece := anRsxPiece.
+ 	forward := forwardBoolean.
+ 	positive := positiveBoolean.!

Item was removed:
- ----- Method: RxsLookaround>>initializePiece:positive: (in category 'initialize-release') -----
- initializePiece: anRsxPiece positive: positiveBoolean
- 
- 	piece := anRsxPiece.
- 	positive := positiveBoolean.!



More information about the Squeak-dev mailing list