[squeak-dev] Regular Expressions are not limited to Strings

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Wed Apr 7 17:37:50 UTC 2021


Hi all,


just a small goody for all those interested: It turns out that, thanks to the great polymorphy in Squeak, regular expressions (as implemented in the Regex package of Trunk originally developed by Vassili Bykov) are not limited to collections that are actually strings. Here is a short counter-example:


regex := RxParser new parse: #(1 2 $+ 1).

matcher := RxParser preferredMatcherClass for: regex.

matcher matches: #(1 2 2 2 1). "true!"


To make the example work, only a small number of hard-coded class names have to be adjusted, see the attached changeset, it's really tiny.


Here's another example:


matcher copy: #(1 2 2 1 0 1 2 1) translatingMatchesUsing: [:match | match negated]. "#(-1 -2 -2 -1 0 -1 -2 -1)"


This also allows us to style texts using regexes:


matcher := 'ab+a' asRegex.

matcher copy: ' aa-aba-abba ' asText translatingMatchesUsing: [:match | match allBold]. " aa-aba-abba "


However, if the original text attributes should be preserved, we would need to hack TextStream >> #withAttributes:do: into the copy methods, analogously to Text >> #format:. I guess this limitation could only be resolved by redesigning Text as a collection of TextCharacters, which might be very slow.


Nevertheless, I think this insight opens great possibilities for other forms of parsing. Maybe one could also process binary streams using polymorphic regex patterns, or even process sequences of domain-specific objects. Because RxsPredicate is so generic, you could also simply define custom predicates for these objects. Later, the next step could be adding support for nested collections (RxsNested?) so that you could parse entire trees of objects ... Ah, so beautiful dreams :-)


Best,

Christoph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210407/7ad964e0/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: regex-polymorphy.2.cs
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210407/7ad964e0/attachment.ksh>


More information about the Squeak-dev mailing list