Q2: Regex anyone?

Helge Horch Helge.Horch at munich.netsurf.de
Tue May 23 19:55:43 UTC 2000


At 19:16 23.05.2000 +0200, Stefan Matthias Aust wrote:
>so I had a closer look at Vassili's code and even ported it to Squeak

Juchu, erm, I mean *great*!!  It was next on my list of things to do, right 
after having completed the current subproject (PDB/PRC, more on that later).

>But unfortunately, Vassili's code doesn't support \(,\),\1 to refer to a
>matched substring or ? to minimize the match instead of maximizing it.

I hadn't looked real close yet, but I think there are methods for accessing 
matched subexpressions, indexes start at 1 (total match), 2 is the first 
parenthesized expression (by '()'), and so on.  Check the documentation 
methods (#c:_usage__) and #testSuite; the stuff was modeled after Henry 
Spencer's popular C package, IIRC.

Here's an exchange from c.l.s a year or so ago:

[---snip---]
Vassili Bykov wrote:
 >Robb Shecter had asked:
> >I'm looking for a regex package that has something like the "substitute"
> >operation of Perl5 regexes.  For example: [...]
>
>Regular expressions deal with search.  Replacement is a separate operation.
>In the context of Smalltalk Strings which are fixed-size character arrays,
>it does not make sense to talk about replacing a sequence of characters
>within a string with a another sequence of a different size.  If Perl
>designers chose to fuzz up the things, it's their call, but this approach
>does not fit Smalltalk.
>
>To do the replacement you want, open two streams: a read stream on the input
>and an output write stream.  Search the input stream for the occurrence of
>the regex you want.  Write the part of the input stream that you've skipped
>to the output stream, write the replacement text to the output stream, then
>set the input stream past the end of the match and search again.  Rinse,
>repeat.
>
> >I've currently got the regex package by Vassili dated August 6, 1996,
> >and it doesn't have this operation.  It also has a non-standard
> >definition of the . character:  it won't match whitespace, and that's
> >making things harder for me.
>
>
>The different interpretation of $. was a lapse of judgement on my part.
>Here is a simple file-in that will make it behave properly:
>
>------snip------
>'From VisualWorks®, Release 3.0 of February 5, 1998 on March 11, 1999 at
>8:45:52 am'!
>
>!RxmPredicate methodsFor: 'initialize-release'!
>
>beAny
>
>     | cr lf |
>     cr := Character cr.
>     lf := Character lf.
>     self predicate: [:char | cr ~= char and: [lf ~= char]]! !
>-----snip-----
>
>There is not too much Smalltalk-based regular expression stuff around.   I
>myself haven't noticed much interest to the thing, which is what stops me
>from investing a few days/weeks into improving my old matcher.
>
>--Vassili
[---snip---]

>So is there any other 100% pure Smalltalk package available?

Not that I'd know of.  I think Vassili's package might just work fine, 
although it'd be nice if he rereleased it under the Squeak license for easy 
incorporation.  How about it, Vassili?

>//  ...come on, kiss the frog!

What do I get?  A princess?

Cheers,
Helge





More information about the Squeak-dev mailing list