[squeak-dev] The Inbox: Regex-Core-ct.61.mcz
christoph.thiede at student.hpi.uni-potsdam.de
christoph.thiede at student.hpi.uni-potsdam.de
Fri Jul 9 17:36:52 UTC 2021
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210709/c7a1e755/attachment.html>
-------------- next part --------------
Hi Levente,
> I thought you deliberately wanted to use regular expressions there.
Not at any price, I guess. :-) In a recent project, I have identified regular expressions as bottlenecks pretty often - the parser only eats up pretty much resources compared to a simple in-line string transformation. What would be your preferred approach here?
> If you want the "Best Performance (tm)", there is a Squeak-specific pattern for these kinds of string-rewrite methods, which consists of a precomputed character set and the use of #new:streamContents:, #indexOfAnyOf:startingAt: and #next:putAll:startingAt:.
> String's #format:, #expandMacrosWithArguments:, #unescapePercentsRaw and #jsonWriteOn: (only if you have JSON-ul.56 in your image) all use that pattern.
Thanks for the tip! Should I bound the size of the string to stream from above or from below? #unescapePercentsRaw and #expandMacrosWithArguments: bound from below but #format: bounds from above.
Best,
Christoph
> Hi Christoph,
>
> On Thu, 8 Jul 2021, christoph.thiede at student.hpi.uni-potsdam.de wrote:
>
> > Hi Levente,
> >
> > two very fair points, thank you for the feedback! Revisiting #escapeString: again, we do not even need to compile a new regex, which is really expensive, but we can use a simple loop instead:
>
> I thought you deliberately wanted to use regular expressions there.
>
> >
> > | special |
> > special := self specialCharacters.
> > ^ String streamContents: [:stream |
> > aString do: [:char |
> > (special includes: char) ifTrue: [stream nextPut: $\].
> > stream nextPut: char]]
> >
> > Which is 90% faster than the original approach. :-)
>
> If you want the "Best Performance (tm)", there is a Squeak-specific
> pattern for these kinds of string-rewrite methods, which consists of a
> precomputed character set and the use of #new:streamContents:,
> #indexOfAnyOf:startingAt: and #next:putAll:startingAt:.
> String's #format:, #expandMacrosWithArguments:, #unescapePercentsRaw
> and #jsonWriteOn: (only if you have JSON-ul.56 in your image) all use that
> pattern.
>
>
> Levente
>
>
---
Sent from Squeak Inbox Talk
More information about the Squeak-dev
mailing list
|