[squeak-dev] The Inbox: Regex-Core-ct.68
christoph.thiede at student.hpi.uni-potsdam.de
christoph.thiede at student.hpi.uni-potsdam.de
Mon Aug 23 19:39:56 UTC 2021
>From http://source.squeak.org/inbox/Regex-Core-ct.68.diff:
A new version of Regex-Core was added to project The Inbox:
http://source.squeak.org/inbox/Regex-Core-ct.68.mcz
==================== Summary ====================
Name: Regex-Core-ct.68
Author: ct
Time: 23 August 2021, 9:21:12.58334 pm
UUID: 6159117b-a67f-bd4a-b30a-82fe1b4abb09
Ancestors: Regex-Core-mt.61
Adds support for unicode backslash atoms.
Some examples:
''Squeak is the perfect language'' allRegexMatches: ''\w*\u{61}\w*''. "--> #(''Squeak'' ''language'')"
''Squeak is beautiful'' allRegexMatches: ''\w*\x75\w*''. "--> #(''Squeak'' ''beautiful'')
''$1.00 = ¬0.85 = £0.73'' allRegexMatches: ''\p{Sc}\d+\.\d+''. "--> (''$1.00'' ''¬0.85'' ''£0.73'')"
''Carpe Squeak!'' allRegexMatches: ''\p{L}+''. "--> #(''Carpe'' ''Squeak'')"
'' get rid of all these nonsense separators'' allRegexMatches: ''\P{Z}+''. "--> (''get'' ''rid'' ''of'' ''all'' ''these'' ''nonsense'' ''separators'')"
Requires Multilingual-ct.259.
=============== Diff against Regex-Core-mt.61 ===============
Item was changed:
Object subclass: #RxParser
instanceVariableNames: ''input lookahead''
+ classVariableNames: ''BackslashConstants BackslashSpecials HexDigits''
- classVariableNames: ''BackslashConstants BackslashSpecials''
poolDictionaries: ''''
category: ''Regex-Core''!
!RxParser commentStamp: ''Tbn 11/12/2010 23:13'' prior: 0!
-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
--
The regular expression parser. Translates a regular expression read from a stream into a parse tree. (''accessing'' protocol). The tree can later be passed to a matcher initialization method. All other classes in this category implement the tree. Refer to their comments for any details.
Instance variables:
input <Stream> A stream with the regular expression being parsed.
lookahead <Character>!
Item was added:
+ ----- Method: RxParser class>>digitsForBase: (in category ''private'') -----
+ digitsForBase: base
+
+ ^ ($0 to: $9)
+ , (($a to: $z) take: base - 10)
+ , (($A to: $Z) take: base - 10)!
Item was added:
+ ----- Method: RxParser class>>hexDigits (in category ''constants'') -----
+ hexDigits
+
+ ^ HexDigits ifNil: [HexDigits := self digitsForBase: 16]!
Item was changed:
----- Method: RxParser class>>initializeBackslashSpecials (in category ''class initialization'') -----
initializeBackslashSpecials
+ "The keys are characters that normally follow a $\, the values are either associations of classes and initialization selectors on their instance side, or evaluables that will be evaluated on the current parser instance."
- "Keys are characters that normally follow a \, the values are
- associations of classes and initialization selectors on the instance side
- of the cl
Hrmpf.
---
Sent from Squeak Inbox Talk
On 2021-08-23T21:22:57+02:00, christoph.thiede at student.hpi.uni-potsdam.de wrote:
> So unfortunately there was no notification about this version, once again, because I have inserted some too special characters in its summary. As an alternative, let me announce my changes here again:
>
> Name: Regex-Core-ct.68
> Author: ct
> Time: 23 August 2021, 9:21:12.58334 pm
> UUID: 6159117b-a67f-bd4a-b30a-82fe1b4abb09
> Ancestors: Regex-Core-mt.61
>
> Adds support for unicode backslash atoms.
>
> Some examples:
>
> 'Squeak is the perfect language' allRegexMatches: '\w*\u{61}\w*'. "--> #('Squeak' 'language')"
> 'Squeak is beautiful' allRegexMatches: '\w*\x75\w*'. "--> #('Squeak' 'beautiful')"
> (WebUtils jsonDecode: '"$1.00 = \u20AC0.85 = \u00A30.73"' readStream) allRegexMatches: '\p{Sc}\d+\.\d+'. "--> ('$1.00' '?0.85' '?0.73')"
> 'Carpe Squeak!' allRegexMatches: '\p{L}+'. "--> #('Carpe' 'Squeak')"
> (WebUtils jsonDecode: '" get rid of \u2007all these nonsense separators"' readStream) allRegexMatches: '\P{Z}+'. "--> ('get' 'rid' 'of' 'all' 'these' 'nonsense' 'separators')"
>
> Requires Multilingual-ct.259.
>
> Tests are in Regex-Tests-Core-ct.24. Looking forward to all your feedback! :-)
>
> Best,
> Christoph
>
> ---
> Sent from Squeak Inbox Talk
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210823/ad1b2d86/attachment.html>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210823/df23d7fd/attachment.html>
More information about the Squeak-dev
mailing list
|