[squeak-dev] The Inbox: Regex-Core-ct.61.mcz

Levente Uzonyi leves at caesar.elte.hu
Thu Jul 8 11:06:58 UTC 2021


Hi Christoph,

On Wed, 7 Jul 2021, commits at source.squeak.org wrote:

> A new version of Regex-Core was added to project The Inbox:
> http://source.squeak.org/inbox/Regex-Core-ct.61.mcz
>
> ==================== Summary ====================
>
> Name: Regex-Core-ct.61
> Author: ct
> Time: 8 July 2021, 1:30:44.09436 am
> UUID: 63655b8f-ad42-0946-b6fe-4dc3100995f1
> Ancestors: Regex-Core-ct.59
>
> Adds String >> #escapeRegex to escape special characters in a string before composing it into another regex.
>
> Usage:
>
> 	':-)' matchesRegex: ':-)' escapeRegex
>
> =============== Diff against Regex-Core-ct.59 ===============
>
> Item was added:
> + ----- Method: RxParser class>>escapeString: (in category 'utilities') -----
> + escapeString: aString
> + 	"Answer a copy of aString which does not contain any unescaped characters. This is the inverse function of String >> #matchesRegex:.
> + 	NB: Basically, we could simply escape every single character in the string, but this would not produce human-readable outputs."
> + 
> + 	^ aString
> + 		copyWithRegex: ('[{1}]' format: {self specialCharacters collect: [:character | '\', character]})

That first argument doesn't look right. If you evaluate it, you'll get

  '[#(''\('' ''\)'' ''\['' ''\]'' ''\*'' ''\+'' ''\?'' ''\{'' ''\}'' ''\.'' ''\^'' ''\$'' ''\:'' ''\\'')]'

I think you need something like this:

String streamContents: [ :stream |
 	stream nextPut: $[.
 	self specialCharacters do: [ :each |
 		stream nextPut: $\; nextPut: each ].
 	stream nextPut: $] ]

which yields

  '[\(\)\[\]\*\+\?\{\}\.\^\$\:\\]'


> Item was added:
> + ----- Method: RxParser class>>specialCharacters (in category 'utilities') -----
> + specialCharacters
> + 
> + 	^ #($( $) $[ $] $* $+ $? ${ $} $. $^ $$ $: $\)!

Why not just ^'()[]*+?{}.^$:\'?


Levente



More information about the Squeak-dev mailing list