A new version of Regex-Tests-Core was added to project The Inbox: http://source.squeak.org/inbox/Regex-Tests-Core-tobe.34.mcz
==================== Summary ====================
Name: Regex-Tests-Core-tobe.34 Author: tobe Time: 13 February 2023, 10:30:58.863775 am UUID: 897286b7-bca5-405c-92c0-52e09604b1fc Ancestors: Regex-Tests-Core-ct.33
Complements Regex-Core-tobe.86
=============== Diff against Regex-Tests-Core-ct.33 ===============
Item was added: + ----- Method: RxParserTest>>testNonQuantifier (in category 'tests') ----- + testNonQuantifier + "Test expressions that look like quantifier expressions but do not fully match" + self assert: ('a{x}' matchesRegex: 'a{x}'). + self assert: ('a{,x}' matchesRegex: 'a{,x}'). + self assert: ('a{,}' matchesRegex: 'a{,}'). + self assert: ('a{,,}' matchesRegex: 'a{,,}'). + self assert: ('a{1,2,}' matchesRegex: 'a{1,2,}'). + self assert: ('a{,' matchesRegex: 'a{,'). + self assert: ('a{' matchesRegex: 'a{'). + self assert: ('a{1' matchesRegex: 'a{1'). + self assert: ('a{1,' matchesRegex: 'a{1,').!
Hi Tom,
thank you for your contribution! However, could you maybe share some reasoning for the intended parser behavior with us? Why do you want to treat incomplete quantifier sequences as literal characters instead of raising a syntax error?
Here are some possible arguments in favor of raising a syntax error that come to my mind:
- Debugging incorrect expressions gets easier (e.g., if you missed a closing curly brace by accident).
- Without backtracking, the design of the parser remains simpler and duplication-free, and its performance remains higher.
- For other incomplete patterns such as '[a' or ':isDigit' we also raise a syntax error instead of parsing the pattern as literals.
- Other parsers behave inconsistently: Some treat the incomplete examples as literals (e.g., JavaScript, .NET), while others raise syntax errors (e.g., Java).
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von commits@source.squeak.org commits@source.squeak.org Gesendet: Montag, 13. Februar 2023 10:30:59 An: squeak-dev@lists.squeakfoundation.org Betreff: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz
A new version of Regex-Tests-Core was added to project The Inbox: http://source.squeak.org/inbox/Regex-Tests-Core-tobe.34.mcz
==================== Summary ====================
Name: Regex-Tests-Core-tobe.34 Author: tobe Time: 13 February 2023, 10:30:58.863775 am UUID: 897286b7-bca5-405c-92c0-52e09604b1fc Ancestors: Regex-Tests-Core-ct.33
Complements Regex-Core-tobe.86
=============== Diff against Regex-Tests-Core-ct.33 ===============
Item was added: + ----- Method: RxParserTest>>testNonQuantifier (in category 'tests') ----- + testNonQuantifier + "Test expressions that look like quantifier expressions but do not fully match" + self assert: ('a{x}' matchesRegex: 'a{x}'). + self assert: ('a{,x}' matchesRegex: 'a{,x}'). + self assert: ('a{,}' matchesRegex: 'a{,}'). + self assert: ('a{,,}' matchesRegex: 'a{,,}'). + self assert: ('a{1,2,}' matchesRegex: 'a{1,2,}'). + self assert: ('a{,' matchesRegex: 'a{,'). + self assert: ('a{' matchesRegex: 'a{'). + self assert: ('a{1' matchesRegex: 'a{1'). + self assert: ('a{1,' matchesRegex: 'a{1,').!
Hi Christoph,
good point, there's multiple ways to proceed given an input such as `x{1`.
The current code would either fail with an unrelated error or assume nil values for the ranges, which is of course not ideal. But you're right: a solution that would reject the above example early and explicitly is simpler to implement than my proposed solution.
The reason I chose to go for accepting incomplete ranges as literal characters was to align with the behavior of ECMAScript regexes. (Specifically, I am receiving ECMAScript regexes from an external, trusted source and have been parsing those quite happily with the Squeak parser so far, save for some minor quirks that could be fixed via string replace and now this issue).
The major argument from a user's perspective I see for accepting incomplete ranges is to reduce the need for escaping. I do also agree with your points, so there's a tradeoff to be decided on :)
Best, Tom
On Tue, 2023-02-14 at 08:32 +0000, Thiede, Christoph wrote:
Hi Tom,
thank you for your contribution! However, could you maybe share some reasoning for the intended parser behavior with us? Why do you want to treat incomplete quantifier sequences as literal characters instead of raising a syntax error?
Here are some possible arguments in favor of raising a syntax error that come to my mind:
- Debugging incorrect expressions gets easier (e.g., if you missed a
closing curly brace by accident).
- Without backtracking, the design of the parser remains simpler and
duplication-free, and its performance remains higher.
- For other incomplete patterns such as '[a' or ':isDigit' we also
raise a syntax error instead of parsing the pattern as literals.
- Other parsers behave inconsistently: Some treat the incomplete
examples as literals (e.g., JavaScript, .NET), while others raise syntax errors (e.g., Java).
Best, Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von commits@source.squeak.org commits@source.squeak.org Gesendet: Montag, 13. Februar 2023 10:30:59 An: squeak-dev@lists.squeakfoundation.org Betreff: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz A new version of Regex-Tests-Core was added to project The Inbox: http://source.squeak.org/inbox/Regex-Tests-Core-tobe.34.mcz
==================== Summary ====================
Name: Regex-Tests-Core-tobe.34 Author: tobe Time: 13 February 2023, 10:30:58.863775 am UUID: 897286b7-bca5-405c-92c0-52e09604b1fc Ancestors: Regex-Tests-Core-ct.33
Complements Regex-Core-tobe.86
=============== Diff against Regex-Tests-Core-ct.33 ===============
Item was added:
- ----- Method: RxParserTest>>testNonQuantifier (in category 'tests')
- testNonQuantifier
+ "Test expressions that look like quantifier expressions but do not fully match" + self assert: ('a{x}' matchesRegex: 'a{x}'). + self assert: ('a{,x}' matchesRegex: 'a{,x}'). + self assert: ('a{,}' matchesRegex: 'a{,}'). + self assert: ('a{,,}' matchesRegex: 'a{,,}'). + self assert: ('a{1,2,}' matchesRegex: 'a{1,2,}'). + self assert: ('a{,' matchesRegex: 'a{,'). + self assert: ('a{' matchesRegex: 'a{'). + self assert: ('a{1' matchesRegex: 'a{1'). + self assert: ('a{1,' matchesRegex: 'a{1,').!
Hi Tom,
thank you for the context! In general, I believe that Squeak does not try to adhere to any particular regex flavor, and we are probably not consistent with any common flavor because of specialties like :isDigit: or </> boundaries. This brings us the great freedom of building our own (better) flavor. On the downside, missing compatibility is a pity, of course. I would love to hear some third opinions on this question. :-)
Specifically, I am receiving ECMAScript regexes from an external, trusted source and have been parsing those quite happily with the Squeak parser so far, save for some minor quirks that could be fixed via string replace and now this issue
Well, I think sanitizing braces would also be just another quick string replacement ;P
(jsRegexString copyWithRegex: '{(?!(\d+,\d*|,\d+)})' matchesReplacedWith: '{') copyWithRegex: '(?<!{(\d+,\d*|,\d+))}' matchesReplacedWith: '}'
(By the way, did you already take a look at regexp-tree transforms? :-))
Best, Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Tom Beckmann tomjonabc@gmail.com Gesendet: Dienstag, 14. Februar 2023 11:16 Uhr An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz Hi Christoph,
good point, there's multiple ways to proceed given an input such as `x{1`.
The current code would either fail with an unrelated error or assume nil values for the ranges, which is of course not ideal. But you're right: a solution that would reject the above example early and explicitly is simpler to implement than my proposed solution.
The reason I chose to go for accepting incomplete ranges as literal characters was to align with the behavior of ECMAScript regexes. (Specifically, I am receiving ECMAScript regexes from an external, trusted source and have been parsing those quite happily with the Squeak parser so far, save for some minor quirks that could be fixed via string replace and now this issue).
The major argument from a user's perspective I see for accepting incomplete ranges is to reduce the need for escaping. I do also agree with your points, so there's a tradeoff to be decided on :)
Best, Tom
On Tue, 2023-02-14 at 08:32 +0000, Thiede, Christoph wrote:
Hi Tom,
thank you for your contribution! However, could you maybe share some reasoning for the intended parser behavior with us? Why do you want to treat incomplete quantifier sequences as literal characters instead of raising a syntax error?
Here are some possible arguments in favor of raising a syntax error that come to my mind:
- Debugging incorrect expressions gets easier (e.g., if you missed a
closing curly brace by accident).
- Without backtracking, the design of the parser remains simpler and
duplication-free, and its performance remains higher.
- For other incomplete patterns such as '[a' or ':isDigit' we also
raise a syntax error instead of parsing the pattern as literals.
- Other parsers behave inconsistently: Some treat the incomplete
examples as literals (e.g., JavaScript, .NET), while others raise syntax errors (e.g., Java).
Best, Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von commits@source.squeak.org commits@source.squeak.org Gesendet: Montag, 13. Februar 2023 10:30:59 An: squeak-dev@lists.squeakfoundation.org Betreff: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz A new version of Regex-Tests-Core was added to project The Inbox: http://source.squeak.org/inbox/Regex-Tests-Core-tobe.34.mcz
==================== Summary ====================
Name: Regex-Tests-Core-tobe.34 Author: tobe Time: 13 February 2023, 10:30:58.863775 am UUID: 897286b7-bca5-405c-92c0-52e09604b1fc Ancestors: Regex-Tests-Core-ct.33
Complements Regex-Core-tobe.86
=============== Diff against Regex-Tests-Core-ct.33 ===============
Item was added:
- ----- Method: RxParserTest>>testNonQuantifier (in category 'tests')
- testNonQuantifier
+ "Test expressions that look like quantifier expressions but do not fully match" + self assert: ('a{x}' matchesRegex: 'a{x}'). + self assert: ('a{,x}' matchesRegex: 'a{,x}'). + self assert: ('a{,}' matchesRegex: 'a{,}'). + self assert: ('a{,,}' matchesRegex: 'a{,,}'). + self assert: ('a{1,2,}' matchesRegex: 'a{1,2,}'). + self assert: ('a{,' matchesRegex: 'a{,'). + self assert: ('a{' matchesRegex: 'a{'). + self assert: ('a{1' matchesRegex: 'a{1'). + self assert: ('a{1,' matchesRegex: 'a{1,').!
Hi all,
quick bump, as I just stumbled upon this bug found by Tom again. :-)
Tom proposed to treat incomplete quantifier syntax such as a{,} as literal braces. However, I still would prefer raising a syntax error instead (it's more in line with [], ::, (), etc. and provides a better debuggability). Third opinions would be great so that we can finally define this behavior and avoid the current situation (which involves unexpected MNUs from the parser). :-)
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2023-02-14T16:55:35+00:00, christoph.thiede@student.hpi.uni-potsdam.de wrote:
Hi Tom,
thank you for the context! In general, I believe that Squeak does not try to adhere to any particular regex flavor, and we are probably not consistent with any common flavor because of specialties like :isDigit: or </> boundaries. This brings us the great freedom of building our own (better) flavor. On the downside, missing compatibility is a pity, of course. I would love to hear some third opinions on this question. :-)
Specifically, I am receiving ECMAScript regexes from an external, trusted source and have been parsing those quite happily with the Squeak parser so far, save for some minor quirks that could be fixed via string replace and now this issue
Well, I think sanitizing braces would also be just another quick string replacement ;P
(jsRegexString copyWithRegex: '{(?!(\d+,\d*|,\d+)})' matchesReplacedWith: '{') copyWithRegex: '(?<!{(\d+,\d*|,\d+))}' matchesReplacedWith: '}'
(By the way, did you already take a look at regexp-tree transforms? :-))
Best, Christoph
Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von Tom Beckmann <tomjonabc(a)gmail.com> Gesendet: Dienstag, 14. Februar 2023 11:16 Uhr An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz Hi Christoph,
good point, there's multiple ways to proceed given an input such as `x{1`.
The current code would either fail with an unrelated error or assume nil values for the ranges, which is of course not ideal. But you're right: a solution that would reject the above example early and explicitly is simpler to implement than my proposed solution.
The reason I chose to go for accepting incomplete ranges as literal characters was to align with the behavior of ECMAScript regexes. (Specifically, I am receiving ECMAScript regexes from an external, trusted source and have been parsing those quite happily with the Squeak parser so far, save for some minor quirks that could be fixed via string replace and now this issue).
The major argument from a user's perspective I see for accepting incomplete ranges is to reduce the need for escaping. I do also agree with your points, so there's a tradeoff to be decided on :)
Best, Tom
On Tue, 2023-02-14 at 08:32 +0000, Thiede, Christoph wrote:
Hi Tom,
thank you for your contribution! However, could you maybe share some reasoning for the intended parser behavior with us? Why do you want to treat incomplete quantifier sequences as literal characters instead of raising a syntax error?
Here are some possible arguments in favor of raising a syntax error that come to my mind:
- Debugging incorrect expressions gets easier (e.g., if you missed a
closing curly brace by accident).
- Without backtracking, the design of the parser remains simpler and
duplication-free, and its performance remains higher.
- For other incomplete patterns such as '[a' or ':isDigit' we also
raise a syntax error instead of parsing the pattern as literals.
- Other parsers behave inconsistently: Some treat the incomplete
examples as literals (e.g., JavaScript, .NET), while others raise syntax errors (e.g., Java).
Best, Christoph
Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von commits(a)source.squeak.org <commits(a)source.squeak.org> Gesendet: Montag, 13. Februar 2023 10:30:59 An: squeak-dev(a)lists.squeakfoundation.org Betreff: [squeak-dev] The Inbox: Regex-Tests-Core-tobe.34.mcz A new version of Regex-Tests-Core was added to project The Inbox: http://source.squeak.org/inbox/Regex-Tests-Core-tobe.34.mcz
==================== Summary ====================
Name: Regex-Tests-Core-tobe.34 Author: tobe Time: 13 February 2023, 10:30:58.863775 am UUID: 897286b7-bca5-405c-92c0-52e09604b1fc Ancestors: Regex-Tests-Core-ct.33
Complements Regex-Core-tobe.86
=============== Diff against Regex-Tests-Core-ct.33 ===============
Item was added:
- ----- Method: RxParserTest>>testNonQuantifier (in category 'tests')
- testNonQuantifier
+ "Test expressions that look like quantifier expressions but do not fully match" + self assert: ('a{x}' matchesRegex: 'a{x}'). + self assert: ('a{,x}' matchesRegex: 'a{,x}'). + self assert: ('a{,}' matchesRegex: 'a{,}'). + self assert: ('a{,,}' matchesRegex: 'a{,,}'). + self assert: ('a{1,2,}' matchesRegex: 'a{1,2,}'). + self assert: ('a{,' matchesRegex: 'a{,'). + self assert: ('a{' matchesRegex: 'a{'). + self assert: ('a{1' matchesRegex: 'a{1'). + self assert: ('a{1,' matchesRegex: 'a{1,').!
squeak-dev@lists.squeakfoundation.org