[squeak-dev] The Inbox: Regex-Core-ct.61.mcz

christoph.thiede at student.hpi.uni-potsdam.de christoph.thiede at student.hpi.uni-potsdam.de
Fri Jul 9 17:36:52 UTC 2021


An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210709/c7a1e755/attachment.html>
-------------- next part --------------
Hi Levente,

> I thought you deliberately wanted to use regular expressions there.

Not at any price, I guess. :-) In a recent project, I have identified regular expressions as bottlenecks pretty often - the parser only eats up pretty much resources compared to a simple in-line string transformation. What would be your preferred approach here?

> If you want the "Best Performance (tm)", there is a Squeak-specific pattern for these kinds of string-rewrite methods, which consists of a precomputed character set and the use of #new:streamContents:, #indexOfAnyOf:startingAt: and #next:putAll:startingAt:.
> String's #format:, #expandMacrosWithArguments:, #unescapePercentsRaw and #jsonWriteOn: (only if you have JSON-ul.56 in your image) all use that pattern.

Thanks for the tip! Should I bound the size of the string to stream from above or from below? #unescapePercentsRaw and #expandMacrosWithArguments: bound from below but #format: bounds from above.

Best,
Christoph

> Hi Christoph,
> 
> On Thu, 8 Jul 2021, christoph.thiede at student.hpi.uni-potsdam.de wrote:
> 
> > Hi Levente,
> > 
> > two very fair points, thank you for the feedback! Revisiting #escapeString: again, we do not even need to compile a new regex, which is really expensive, but we can use a simple loop instead:
> 
> I thought you deliberately wanted to use regular expressions there.
> 
> > 
> >     | special |
> >     special := self specialCharacters.
> >     ^ String streamContents: [:stream |
> >         aString do: [:char |
> >             (special includes: char) ifTrue: [stream nextPut: $\].
> >             stream nextPut: char]]
> > 
> > Which is 90% faster than the original approach. :-)
> 
> If you want the "Best Performance (tm)", there is a Squeak-specific 
> pattern for these kinds of string-rewrite methods, which consists of a 
> precomputed character set and the use of #new:streamContents:, 
> #indexOfAnyOf:startingAt: and #next:putAll:startingAt:.
> String's #format:, #expandMacrosWithArguments:, #unescapePercentsRaw 
> and #jsonWriteOn: (only if you have JSON-ul.56 in your image) all use that 
> pattern.
> 
> 
> Levente
> 
> 

---

Sent from Squeak Inbox Talk


More information about the Squeak-dev mailing list