[squeak-dev] Regular Expressions are not limited to Strings

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Wed Apr 7 19:33:01 UTC 2021


Woohoo, nested regular expressions are even easier than I thought!


innerRegex := RxParser new parse: #(3 4 $* 5).
innerMatcher := RxParser preferredMatcherClass for: innerRegex.
regex := RxParser new parse: {#(1 2). innerMatcher. ${.$,.$3.$}. #(6 7)}.
matcher := RxParser preferredMatcherClass for: regex.

matcher matches: #((1 2) (3 4 5) (6 7)). "true"
matcher matches: #((1 2) (3 4 5) (3 5) (6 7)). "true"
matcher matches: #((1 2) (3 4 5) (3 5) (3 4 4 5) (6 7)). "true"
matcher matches: #((1 2) (3 4 5) (3 5) (3 4 4 5) (3 5) (6 7)). "false"
matcher matches: #((1 2) (3 4 5) (3 2 5) (3 4 4 5) (6 7)). "false"


I'll share my changeset upon request. This is really exciting stuff.

Best,
Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph
Gesendet: Mittwoch, 7. April 2021 19:37 Uhr
An: Squeak Dev
Betreff: [squeak-dev] Regular Expressions are not limited to Strings


Hi all,


just a small goody for all those interested: It turns out that, thanks to the great polymorphy in Squeak, regular expressions (as implemented in the Regex package of Trunk originally developed by Vassili Bykov) are not limited to collections that are actually strings. Here is a short counter-example:


regex := RxParser new parse: #(1 2 $+ 1).

matcher := RxParser preferredMatcherClass for: regex.

matcher matches: #(1 2 2 2 1). "true!"


To make the example work, only a small number of hard-coded class names have to be adjusted, see the attached changeset, it's really tiny.


Here's another example:


matcher copy: #(1 2 2 1 0 1 2 1) translatingMatchesUsing: [:match | match negated]. "#(-1 -2 -2 -1 0 -1 -2 -1)"


This also allows us to style texts using regexes:


matcher := 'ab+a' asRegex.

matcher copy: ' aa-aba-abba ' asText translatingMatchesUsing: [:match | match allBold]. " aa-aba-abba "


However, if the original text attributes should be preserved, we would need to hack TextStream >> #withAttributes:do: into the copy methods, analogously to Text >> #format:. I guess this limitation could only be resolved by redesigning Text as a collection of TextCharacters, which might be very slow.


Nevertheless, I think this insight opens great possibilities for other forms of parsing. Maybe one could also process binary streams using polymorphic regex patterns, or even process sequences of domain-specific objects. Because RxsPredicate is so generic, you could also simply define custom predicates for these objects. Later, the next step could be adding support for nested collections (RxsNested?) so that you could parse entire trees of objects ... Ah, so beautiful dreams :-)


Best,

Christoph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210407/dc98346d/attachment.html>


More information about the Squeak-dev mailing list