[squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz

Tom Beckmann tomjonabc at gmail.com
Tue Feb 14 10:16:57 UTC 2023


Hi Christoph,

good point, there's multiple ways to proceed given an input such as
`x{1`.

The current code would either fail with an unrelated error or assume
nil values for the ranges, which is of course not ideal. But you're
right: a solution that would reject the above example early and
explicitly is simpler to implement than my proposed solution.

The reason I chose to go for accepting incomplete ranges as literal
characters was to align with the behavior of ECMAScript regexes.
(Specifically, I am receiving ECMAScript regexes from an external,
trusted source and have been parsing those quite happily with the
Squeak parser so far, save for some minor quirks that could be fixed
via string replace and now this issue).

The major argument from a user's perspective I see for accepting
incomplete ranges is to reduce the need for escaping. I do also agree
with your points, so there's a tradeoff to be decided on :)

Best,
Tom

On Tue, 2023-02-14 at 08:32 +0000, Thiede, Christoph wrote:
> Hi Tom,
> 
> thank you for your contribution! However, could you maybe share some
> reasoning for the intended parser behavior with us? Why do you want
> to treat incomplete quantifier sequences as literal characters
> instead of raising a syntax error?
> 
> Here are some possible arguments in favor of raising a syntax error
> that come to my mind:
> 
> - Debugging incorrect expressions gets easier (e.g., if you missed a
> closing curly brace by accident).
> - Without backtracking, the design of the parser remains simpler and
> duplication-free, and its performance remains higher.
> - For other incomplete patterns such as '[a' or ':isDigit' we also
> raise a syntax error instead of parsing the pattern as literals.
> - Other parsers behave inconsistently: Some treat the incomplete
> examples as literals (e.g., JavaScript, .NET), while others raise
> syntax errors (e.g., Java).
> 
> Best,
> Christoph
> 
> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von commits at source.squeak.org <commits at source.squeak.org>
> Gesendet: Montag, 13. Februar 2023 10:30:59
> An: squeak-dev at lists.squeakfoundation.org
> Betreff: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz
>  
> A new version of Regex-Tests-Core was added to project The Inbox:
> http://source.squeak.org/inbox/Regex-Tests-Core-tobe.34.mcz
> 
> ==================== Summary ====================
> 
> Name: Regex-Tests-Core-tobe.34
> Author: tobe
> Time: 13 February 2023, 10:30:58.863775 am
> UUID: 897286b7-bca5-405c-92c0-52e09604b1fc
> Ancestors: Regex-Tests-Core-ct.33
> 
> Complements Regex-Core-tobe.86
> 
> =============== Diff against Regex-Tests-Core-ct.33 ===============
> 
> Item was added:
> + ----- Method: RxParserTest>>testNonQuantifier (in category 'tests')
> -----
> + testNonQuantifier
> +        "Test expressions that look like quantifier expressions but
> do not fully match"
> +        self assert: ('a{x}'  matchesRegex: 'a{x}').
> +        self assert: ('a{,x}'  matchesRegex: 'a{,x}').
> +        self assert: ('a{,}'  matchesRegex: 'a{,}').
> +        self assert: ('a{,,}'  matchesRegex: 'a{,,}').
> +        self assert: ('a{1,2,}'  matchesRegex: 'a{1,2,}').
> +        self assert: ('a{,'  matchesRegex: 'a{,').
> +        self assert: ('a{'  matchesRegex: 'a{').
> +        self assert: ('a{1'  matchesRegex: 'a{1').
> +        self assert: ('a{1,'  matchesRegex: 'a{1,').!
> 
> 
> 



More information about the Squeak-dev mailing list