[Newbies] Re: ByteString>>match: greedyness of * ??

Ch Lamprecht ch.l.ngre at online.de
Tue Jan 8 22:52:20 UTC 2008


Hello,

I browsed String>>startingAt:match:startingAt:

and changed two lines to make errorhandling work as probably intended by the 
author.

startingAt: keyStart match: text startingAt: textStart
	"Answer whether text matches the pattern in this string.
	Matching ignores upper/lower case differences.
	Where this string contains #, text may contain any character.
	Where this string contains *, text may contain any sequence of characters."
	| anyMatch matchStart matchEnd i matchStr j ii jj |
	i := keyStart.
	j := textStart.

	"Check for any #'s"
	[i > self size ifTrue: [^ j > text size "Empty key matches only empty string"].
	(self at: i) = $#] whileTrue:
		["# consumes one char of key and one char of text"
		j > text size ifTrue: [^ false "no more text"].
		i := i+1.  j := j+1].

	"Then check for *"
	(self at: i) = $*
		ifTrue: [i = self size ifTrue:
					[^ true "Terminal * matches all"].
				"* means next match string can occur anywhere"
				anyMatch := true.
				matchStart := i + 1]
		ifFalse: ["Otherwise match string must occur immediately"
				anyMatch := false.
				matchStart := i].

	"Now determine the match string"
	matchEnd := self size.
	(ii := self indexOf: $* startingAt: matchStart) > 0 ifTrue:


"changed the following line to:"
		[ii = matchStart  ifTrue: [self error: '** not valid -- use * instead'].
		matchEnd := ii-1].
	(ii := self indexOf: $# startingAt: matchStart) > 0 ifTrue:


"changed the following line to:"
		[ii = matchStart  ifTrue: [self error: '*# not valid -- use #* instead'].
		matchEnd := matchEnd min: ii-1].
	matchStr := self copyFrom: matchStart to: matchEnd.

	"Now look for the match string"
	[jj := text findString: matchStr startingAt: j caseSensitive: false.
	anyMatch ifTrue: [jj > 0] ifFalse: [jj = j]]
		whileTrue:
		["Found matchStr at jj.  See if the rest matches..."
		(self startingAt: matchEnd+1 match: text startingAt: jj + matchStr size) ifTrue:
			[^ true "the rest matches -- success"].
		"The rest did not match."
		anyMatch ifFalse: [^ false].
		"Preceded by * -- try for a later match"
		j := j+1].
	^ false "Failed to find the match string"



Kent Loobey wrote:
> On Tuesday 08 January 2008 13:03:22 Ch Lamprecht wrote:
> 
>>nicolas cellier wrote:
>>
>>>This behavior is squeakish, other Smalltalk match differently:
>>>
>>>VW:  '**' match: 'e'. "true"
>>>gst: '**' match: 'e'. "true"
>>>
>>>Anyway, this pattern matching is limited. How do you match a '*' itself?
>>>I thought your example might be interpreted as an escape sequence, but
>>>no, there is no escape in this simple matching.
>>>
>>>'**' match: '*'. "false"
>>>'\*' match: '*'. "false"
>>>
>>>Try VBregex or another regex package.
>>>
>>>Nicolas
>>
>>Hi,
>>thank you.
>>In addition to the expressions below, I found, that #match: does not behave
>>as stated by the comment given in the method definition itself:
>>
>> From ByteString>>match:
>>
>>"
>>	[snip]
>>	'foo*baz'	match: 'foo23baz' true
>>	'foo*baz'	match: 'foobaz' true    <----
>>	'foo*baz'	match: 'foo23bazo' false
>>	'foo'		match: 'Foo' true
>>	'foo*baz*zort' match: 'foobazort' false
>>	'foo*baz*zort' match: 'foobazzort' false <----
>>	[snip]
>>"
> 
> 
> In general * means any character including no characters.
> 
> So the first one is foo any-character baz.
> 
> The third is false because of the "o" on the end.  If you wanted it to work 
> you could put 'foo*baz*'.
> 
> I don't know why the last one wasn't reported as true.
> 
> 
>>confused, Christoph
>>
>>
>>>Ch Lamprecht a écrit :
>>>
>>>>Hello,
>>>>
>>>>I found the following results for some expressions using #match:
>>>>
>>>>'e' match: 'e'.   "true"
>>>>'*' match: 'e'.   "true"
>>>>'#' match: 'e'.   "true"
>>>>
>>>>'*e' match: 'e'.  "true"
>>>>'*#' match: 'e'.  "false"
>>>>'**' match: 'e'.  "false"
>>>>
>>>>'*' match: ''.    "true"
>>>>'**' match: ''.   "false"
>>>>
>>>>
>>>>Is this expected behavior?
>>>>Looks like * is sometimes 'greedy', sometimes not. (Comparing 4 and 5)
>>>>Thank you for any hints.


More information about the Beginners mailing list