[squeak-dev] The Inbox: Regex-Core-ct.61.mcz

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Thu Jul 8 10:33:56 UTC 2021


Hi Marcel,


> Why is #escapeString: inverse to #matchesRegex:? I am confused. :-)


Because of this:


':-)' matchesRegex: ':-)' escapeRegex "--> true"

Well, mathematically spoken, this is not really an inverse function, I guess ... But something like this, I can't come up with the name. :-)


> Hmm... "does not contain any unescaped characters" means "only contains escaped characters"?


Kind of. This comment was an inapt attempt of communicating that only special characters will be escaped. Otherwise, we could just prepend every character with a backslash, this would be total escaping, but hard to read. :-)


Sure, the names are completely open to discussion, I only tried to be consistent with #escapeHtmlText. The problem is that we need to encode the semantic role of the receiver in selector names like this. #asRegexEscaped sounds to me as if it would answer an RxMatcher which is not the case. #escapedRegex would be possible but is not aligned with #escapeHtmlText. Hm ... Any other ideas? :-)


Best,

Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Taeumel, Marcel
Gesendet: Donnerstag, 8. Juli 2021 09:06:51
An: squeak-dev
Betreff: Re: [squeak-dev] The Inbox: Regex-Core-ct.61.mcz

Hi Christoph,

I like the feature, I am not sure about the selectors you've chosen.

#escapeString:
#escapeRegex
#specialCharacters

I couldn't find any good examples in the image:

String >> #encodeForHTTP
String >> #unescapePercents
Character >> #escapeEntities (not used at all?!)

Hmm... maybe #reservedCharacters instead of #specialCharacters? Maybe #escapedRegex instead of #escapeRegex? And maybe String >> #asRegexEscaped?

Hmm... "does not contain any unescaped characters" means "only contains escaped characters"? Why is #escapeString: inverse to #matchesRegex:? I am confused. :-) The current names do not help me here.

Best,
Marcel


Am 08.07.2021 01:30:52 schrieb commits at source.squeak.org <commits at source.squeak.org>:

A new version of Regex-Core was added to project The Inbox:
http://source.squeak.org/inbox/Regex-Core-ct.61.mcz

==================== Summary ====================

Name: Regex-Core-ct.61
Author: ct
Time: 8 July 2021, 1:30:44.09436 am
UUID: 63655b8f-ad42-0946-b6fe-4dc3100995f1
Ancestors: Regex-Core-ct.59

Adds String >> #escapeRegex to escape special characters in a string before composing it into another regex.

Usage:

':-)' matchesRegex: ':-)' escapeRegex

=============== Diff against Regex-Core-ct.59 ===============

Item was added:
+ ----- Method: RxParser class>>escapeString: (in category 'utilities') -----
+ escapeString: aString
+ "Answer a copy of aString which does not contain any unescaped characters. This is the inverse function of String >> #matchesRegex:.
+ NB: Basically, we could simply escape every single character in the string, but this would not produce human-readable outputs."
+
+ ^ aString
+ copyWithRegex: ('[{1}]' format: {self specialCharacters collect: [:character | '\', character]})
+ matchesTranslatedUsing: [:match | '\', match]!

Item was added:
+ ----- Method: RxParser class>>specialCharacters (in category 'utilities') -----
+ specialCharacters
+
+ ^ #($( $) $[ $] $* $+ $? ${ $} $. $^ $$ $: $\)!

Item was added:
+ ----- Method: String>>escapeRegex (in category '*Regex-Core') -----
+ escapeRegex
+
+ ^ RxParser escapeString: self!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210708/a80daff8/attachment.html>


More information about the Squeak-dev mailing list