[squeak-dev] The Inbox: Regex-Core-ct.61.mcz
christoph.thiede at student.hpi.uni-potsdam.de
christoph.thiede at student.hpi.uni-potsdam.de
Thu Jul 8 12:14:55 UTC 2021
Hi Levente,
two very fair points, thank you for the feedback! Revisiting #escapeString: again, we do not even need to compile a new regex, which is really expensive, but we can use a simple loop instead:
| special |
special := self specialCharacters.
^ String streamContents: [:stream |
aString do: [:char |
(special includes: char) ifTrue: [stream nextPut: $\].
stream nextPut: char]]
Which is 90% faster than the original approach. :-)
I will upload a new inbox version when we have made progress with the current naming discussion.
Best,
Christoph
> Hi Christoph,
>
> On Wed, 7 Jul 2021, commits at source.squeak.org wrote:
>
> > A new version of Regex-Core was added to project The Inbox:
> > http://source.squeak.org/inbox/Regex-Core-ct.61.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Regex-Core-ct.61
> > Author: ct
> > Time: 8 July 2021, 1:30:44.09436 am
> > UUID: 63655b8f-ad42-0946-b6fe-4dc3100995f1
> > Ancestors: Regex-Core-ct.59
> >
> > Adds String >> #escapeRegex to escape special characters in a string before composing it into another regex.
> >
> > Usage:
> >
> > ':-)' matchesRegex: ':-)' escapeRegex
> >
> > =============== Diff against Regex-Core-ct.59 ===============
> >
> > Item was added:
> > + ----- Method: RxParser class>>escapeString: (in category 'utilities') -----
> > + escapeString: aString
> > + "Answer a copy of aString which does not contain any unescaped characters. This is the inverse function of String >> #matchesRegex:.
> > + NB: Basically, we could simply escape every single character in the string, but this would not produce human-readable outputs."
> > +
> > + ^ aString
> > + copyWithRegex: ('[{1}]' format: {self specialCharacters collect: [:character | '\', character]})
>
> That first argument doesn't look right. If you evaluate it, you'll get
>
> '[#(''\('' ''\)'' ''\['' ''\]'' ''\*'' ''\+'' ''\?'' ''\{'' ''\}'' ''\.'' ''\^'' ''\$'' ''\:'' ''\\'')]'
>
> I think you need something like this:
>
> String streamContents: [ :stream |
> stream nextPut: $[.
> self specialCharacters do: [ :each |
> stream nextPut: $\; nextPut: each ].
> stream nextPut: $] ]
>
> which yields
>
> '[\(\)\[\]\*\+\?\{\}\.\^\$\:\\]'
>
>
> > Item was added:
> > + ----- Method: RxParser class>>specialCharacters (in category 'utilities') -----
> > + specialCharacters
> > +
> > + ^ #($( $) $[ $] $* $+ $? ${ $} $. $^ $$ $: $\)!
>
> Why not just ^'()[]*+?{}.^$:\'?
>
>
> Levente
>
>
More information about the Squeak-dev
mailing list
|