[squeak-dev] Anybody got an elegent construct for this functional monstrosity?

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Sat Nov 27 18:14:55 UTC 2021


Beautiful, Eliot. :-)


(I was just wondering why our regex matcher is not capable to handle this kind of query efficiently - afaik it uses a DFS approach. BFS regex optimization could be just another interesting side-project ... :D)


Best,

Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
Gesendet: Samstag, 27. November 2021 18:50:10
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?



On Sat, Nov 27, 2021 at 9:47 AM Eliot Miranda <eliot.miranda at gmail.com<mailto:eliot.miranda at gmail.com>> wrote:


On Sat, Nov 27, 2021 at 8:36 AM Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de<mailto:Christoph.Thiede at student.hpi.uni-potsdam.de>> wrote:

What about


{ '{|'  . '|-' . '|}' . '{{' . '}}' .  '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]


?


Maybe also this one if the identity of the matching pattern is of interest:


{ '{|'  . '|-' . '|}' . '{{' . '}}' .  '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }

    detect: [:pattern | self match: pattern]

    ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern]

    ifNone: [self inform: 'no match']


Best,

Christoph


PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.

And just as importantly, never use a brace construct when a literal array will do.  { '{|'  . '|-' . '|}' . '{{' . '}}' .  '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } is created at run-time.  The equivalent #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''')  is created at compile-time.  Inspect the method in the browser or the debugger and have a look at the bytecode.

If performancer is important you'll construct a parser of some form.  For example, the simplest optimization here is to check if the first character is a candidate and then if the second character is a candidate.  In a parser you'd have different code executed for each first character candidate.  But the below avoids doing a match until we know both characters are in the set.  I've written it as a doit bit I'm imagining Firsts and Seconds are class or instance variables (the issue here is provided the matcher is called often we want Firsts and Seconds to be computed precisely once).

| patterns first second Firsts Seconds |
patterns :=  #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''').
Firsts ifNil:
   [Firsts := (patterns collect: #first) as: String.
    Seconds  := (patterns collect: #second) as: String].
self size >= 2
and: [(Firsts includes: (first := self first))
and: [(Seconds includes: (second := sef second)
and: [patterns includes: (ByteString with: first with: second)]]]

Oops.  I meant of course
| patterns first second Firsts Seconds |
patterns :=  #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''').
Firsts ifNil:
   [Firsts := ((patterns collect: #first) as: Set) as: String.
    Seconds  := ((patterns collect: #second) as: Set) as: String].
self size >= 2
and: [(Firsts includes: (first := self first))
and: [(Seconds includes: (second := sef second)
and: [patterns includes: (ByteString with: first with: second)]]]


________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org<mailto:squeak-dev-bounces at lists.squeakfoundation.org>> im Auftrag von gettimothy via Squeak-dev <squeak-dev at lists.squeakfoundation.org<mailto:squeak-dev at lists.squeakfoundation.org>>
Gesendet: Samstag, 27. November 2021 17:30:38
An: squeak-dev
Betreff: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?

I have a ReadStream and I want to detect some substrings in it.

This works, but it is ugly.


((self match:'{|') |
(self match:'|-') |
(self match:'|}') |
(self match:'{{') |
(self match:'}}') |
(self match:'[[') |
(self match:']]') |
(self match:'__') |
(self match:'==') |
(self match:'::') |
(self match:'**') |
(self match:'##') |
(self match:'''') )

Is anybody aware of an elegant approach to this?


Something along the lines of


self matchAny: { '{|'  . '|-' . '|}' . '{{' . '}}' .  '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }


thx in advance




--
_,,,^..^,,,_
best, Eliot


--
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211127/e48d005d/attachment.html>


More information about the Squeak-dev mailing list