[squeak-dev] The Inbox: Regex-Core-ct.61.mcz

Levente Uzonyi leves at caesar.elte.hu
Fri Jul 9 20:50:03 UTC 2021


Hi Christoph,

On Fri, 9 Jul 2021, christoph.thiede at student.hpi.uni-potsdam.de wrote:

> Hi Levente,
>
>> I thought you deliberately wanted to use regular expressions there.
>
> Not at any price, I guess. :-) In a recent project, I have identified regular expressions as bottlenecks pretty often - the parser only eats up pretty much resources compared to a simple in-line string transformation. What would be your preferred approach here?

It usually helps if you store and reuse the regular expression (the 
RxMatcher). Since it's not thread-safe, you have to make sure you don't 
use it concurrently.

>
>> If you want the "Best Performance (tm)", there is a Squeak-specific pattern for these kinds of string-rewrite methods, which consists of a precomputed character set and the use of #new:streamContents:, #indexOfAnyOf:startingAt: and #next:putAll:startingAt:.
>> String's #format:, #expandMacrosWithArguments:, #unescapePercentsRaw and #jsonWriteOn: (only if you have JSON-ul.56 in your image) all use that pattern.
>
> Thanks for the tip! Should I bound the size of the string to stream from above or from below? #unescapePercentsRaw and #expandMacrosWithArguments: bound from below but #format: bounds from above.

In this case, you know that the string cannot be shorter, but it might be 
longer. If you want a hardcoded value, either go with +10% or keep 
the original size.
If you want to make it a bit smarter, then move the first 
#indexOfAnyOf:startingAt: to the beginning of the method and if it returns 
0, just return with the string or a copy of it.
And use the +10% approach if there's something to be escaped.


Levente


More information about the Squeak-dev mailing list