Regular Expression Plugin Interfaces

Andrew C. Greenberg werdna at gate.net
Mon Mar 8 02:03:46 UTC 1999


[This was wrongly posted before with a different title.  My apologies]

I'd like to thank everyone for their feedback and remarks.  The ports 
of the Plugin were straightforward and came along (and others are 
coming along) quite well, and I expect a new release in the next 
week.  I would appreciate any comments or suggestions you may have.

I. COMMENTS ON PRESENT RELEASE

Almost uniform was general disapproval for the order of parameters 
provided in the String convenience functions, applying search strings 
to patterns rather than the other way around.  At present, you 
execute:

	'xy(z+)y' reSearch: 'xyzzy'.

to obtain a match and capture the two 'z' in 'xyzzy'.  Folks also 
didn't like the separate functions for global searching and 
substitution, preferring instead parameters or options to do the 
same.  At present, you execute:

	patString reGsearch: subjString sub: aBlock.

	patString reGsearch: subjString collect: aBlock.

These messages were laid in as they were because they nicely 
paralleled the fully parenthesized expressions for which they were 
shorthand.  For example, the preceding message could have been 
written:

	((RePattern on: patString) gsearch: subjString) collect: aBlock.

However, I agree that there is a cognitive dissonance arising from 
matching subjects to patterns rather than vice-versa, and that the 
present interface is too focused on the structure of the 
implementation, rather than on the manner in which such code is 
ordinarily used.  I acceed therefore, and have proposed a new 
approach, paralleling and extending the existing string function for 
glob matching from Smalltalk80:

	aString match: simplePatternString.

II. PROPOSED NEW MESSAGES STRUCTURE

I propose to replace the entire package of String convenience 
Functions with the following functions in the next release 
(maintaining existing messages as "deprecated" for a brief time).


	subjString reMatch: patString [opt: optString] [from: from [to: to]]
	subjString reMatch: patString [opt: optString] [collect: aBlock]
	subjString reMatch: patString [opt: optString] [sub: aBlock]

and retaining:

	patString asRe
	patString asRe: opt [onErrorRun: aBlock]

for those who wish to compile the Re and do the matching directly for 
efficiency reasons.

III. OTHER CHANGES:

	* the pattern compiler now caches the last few patterns, so 
that repeated use of the String convenience functions can be done 
repeatedly, say in a loop, without having to recompile every time. 
(The cost of cache lookups can still be avoided with an express 
compile).

	* I will add a function akin to Perl's "split"

	* semantics of global matching now mirror later versions of 
Perl 5.  The present release infinite loops on matching an empty 
string (as did Perl 5.0).  Several pointed out that this is broken, 
so I modified the code to handle the present "bump-ahead" semantics 
of more recent versions.

	* the present pattern compiler now properly optimizes for 
initial string class checks.





More information about the Squeak-dev mailing list