Regular Expression Plugin Interfaces
Andrew C. Greenberg
werdna at gate.net
Mon Mar 8 02:03:46 UTC 1999
[This was wrongly posted before with a different title. My apologies]
I'd like to thank everyone for their feedback and remarks. The ports
of the Plugin were straightforward and came along (and others are
coming along) quite well, and I expect a new release in the next
week. I would appreciate any comments or suggestions you may have.
I. COMMENTS ON PRESENT RELEASE
Almost uniform was general disapproval for the order of parameters
provided in the String convenience functions, applying search strings
to patterns rather than the other way around. At present, you
execute:
'xy(z+)y' reSearch: 'xyzzy'.
to obtain a match and capture the two 'z' in 'xyzzy'. Folks also
didn't like the separate functions for global searching and
substitution, preferring instead parameters or options to do the
same. At present, you execute:
patString reGsearch: subjString sub: aBlock.
patString reGsearch: subjString collect: aBlock.
These messages were laid in as they were because they nicely
paralleled the fully parenthesized expressions for which they were
shorthand. For example, the preceding message could have been
written:
((RePattern on: patString) gsearch: subjString) collect: aBlock.
However, I agree that there is a cognitive dissonance arising from
matching subjects to patterns rather than vice-versa, and that the
present interface is too focused on the structure of the
implementation, rather than on the manner in which such code is
ordinarily used. I acceed therefore, and have proposed a new
approach, paralleling and extending the existing string function for
glob matching from Smalltalk80:
aString match: simplePatternString.
II. PROPOSED NEW MESSAGES STRUCTURE
I propose to replace the entire package of String convenience
Functions with the following functions in the next release
(maintaining existing messages as "deprecated" for a brief time).
subjString reMatch: patString [opt: optString] [from: from [to: to]]
subjString reMatch: patString [opt: optString] [collect: aBlock]
subjString reMatch: patString [opt: optString] [sub: aBlock]
and retaining:
patString asRe
patString asRe: opt [onErrorRun: aBlock]
for those who wish to compile the Re and do the matching directly for
efficiency reasons.
III. OTHER CHANGES:
* the pattern compiler now caches the last few patterns, so
that repeated use of the String convenience functions can be done
repeatedly, say in a loop, without having to recompile every time.
(The cost of cache lookups can still be avoided with an express
compile).
* I will add a function akin to Perl's "split"
* semantics of global matching now mirror later versions of
Perl 5. The present release infinite loops on matching an empty
string (as did Perl 5.0). Several pointed out that this is broken,
so I modified the code to handle the present "bump-ahead" semantics
of more recent versions.
* the present pattern compiler now properly optimizes for
initial string class checks.
More information about the Squeak-dev
mailing list
|