using RegularExpressions for matching (was Re: deficience in Squeak)

ajr ly4aegw02 at sneakemail.com
Sun Nov 30 23:43:43 UTC 2003


fwiw, I have written a regex package that I would be happy to share.

It uses another package of mine which is a parser building tool I've used
to build parsers quickly for a variety of languages, from Smalltalk to
Java to the regular expression package. It uses the same technique for
lexical analysis as for parsing.

With this parser tool, each significant production is a class which knows
how to "deserialize" an instance of itself from source using a
#from:startingWith: method. So production rules are "distributed" among
the objects they represent, which IMHO is more oo than having them in a
parser definition somewhere.

(In my regex package there is Regex RxAny RxAtom RxBracket RxBranch
RxCharacter RxDot RxEscape RxNode RxParen RxPiece RxRange
RxSpecialCharacter).

The body of #from:startingWith: is written using a set of high level match
methods which is very close to BNF . As an example, for the pattern of
matching one or more, there are:

#matchOneOrMore:
#matchOneOrMore:separatedBy:

For example, here is the parse of a Regex:

Regex class methods

from: input startingWith: s

   ^(input
      matchOneOrMore: RxBranch
      separatedBy: $|)
         ifNotNilDo:[ : list |
            self new branches: list]

The parse tree resulting from parsing a regex pattern itself uses the
parsing tool (#from:startingWith: on the INSTANCE side) to recognize
instances of itself in a String or Stream.

The parsing package contains methods for a very wide variety of BNF
constructs and it is easy to add new ones as needed.

I am using the code in production so basically they are developed to my
particular requirements and they would probably need work to be ready for
prime time. I would be interested in doing that and supporting it if
anyone would get some use out of them. As far as doc there are many
test-cases imbedded as code comments; I would be willing to write more
formal doc on it if there was enough interest.

The packages in Monticello format. Here are the urls:

http://reider.net/squeak/ajr-match-ajr.8.mcz
http://reider.net/squeak/ajr-regex-ajr.7.mcz




More information about the Squeak-dev mailing list