[squeak-dev] The Inbox: Regex-Core-ct.61.mcz

christoph.thiede at student.hpi.uni-potsdam.de christoph.thiede at student.hpi.uni-potsdam.de
Thu Jul 8 12:14:55 UTC 2021


Hi Levente,

two very fair points, thank you for the feedback! Revisiting #escapeString: again, we do not even need to compile a new regex, which is really expensive, but we can use a simple loop instead:

	| special |
	special := self specialCharacters.
	^ String streamContents: [:stream |
		aString do: [:char |
			(special includes: char) ifTrue: [stream nextPut: $\].
			stream nextPut: char]]

Which is 90% faster than the original approach. :-)
I will upload a new inbox version when we have made progress with the current naming discussion.

Best,
Christoph

> Hi Christoph,
> 
> On Wed, 7 Jul 2021, commits at source.squeak.org wrote:
> 
> > A new version of Regex-Core was added to project The Inbox:
> > http://source.squeak.org/inbox/Regex-Core-ct.61.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Regex-Core-ct.61
> > Author: ct
> > Time: 8 July 2021, 1:30:44.09436 am
> > UUID: 63655b8f-ad42-0946-b6fe-4dc3100995f1
> > Ancestors: Regex-Core-ct.59
> >
> > Adds String >> #escapeRegex to escape special characters in a string before composing it into another regex.
> >
> > Usage:
> >
> > 	':-)' matchesRegex: ':-)' escapeRegex
> >
> > =============== Diff against Regex-Core-ct.59 ===============
> >
> > Item was added:
> > + ----- Method: RxParser class>>escapeString: (in category 'utilities') -----
> > + escapeString: aString
> > + 	"Answer a copy of aString which does not contain any unescaped characters. This is the inverse function of String >> #matchesRegex:.
> > + 	NB: Basically, we could simply escape every single character in the string, but this would not produce human-readable outputs."
> > + 
> > + 	^ aString
> > + 		copyWithRegex: ('[{1}]' format: {self specialCharacters collect: [:character | '\', character]})
> 
> That first argument doesn't look right. If you evaluate it, you'll get
> 
>   '[#(''\('' ''\)'' ''\['' ''\]'' ''\*'' ''\+'' ''\?'' ''\{'' ''\}'' ''\.'' ''\^'' ''\$'' ''\:'' ''\\'')]'
> 
> I think you need something like this:
> 
> String streamContents: [ :stream |
>  	stream nextPut: $[.
>  	self specialCharacters do: [ :each |
>  		stream nextPut: $\; nextPut: each ].
>  	stream nextPut: $] ]
> 
> which yields
> 
>   '[\(\)\[\]\*\+\?\{\}\.\^\$\:\\]'
> 
> 
> > Item was added:
> > + ----- Method: RxParser class>>specialCharacters (in category 'utilities') -----
> > + specialCharacters
> > + 
> > + 	^ #($( $) $[ $] $* $+ $? ${ $} $. $^ $$ $: $\)!
> 
> Why not just ^'()[]*+?{}.^$:\'?
> 
> 
> Levente
> 
> 


More information about the Squeak-dev mailing list