[squeak-dev] The Inbox: Regex-Core-ct.68

christoph.thiede at student.hpi.uni-potsdam.de christoph.thiede at student.hpi.uni-potsdam.de
Mon Aug 23 19:39:56 UTC 2021


>From http://source.squeak.org/inbox/Regex-Core-ct.68.diff:

A new version of Regex-Core was added to project The Inbox:
http://source.squeak.org/inbox/Regex-Core-ct.68.mcz

==================== Summary ====================

Name: Regex-Core-ct.68
Author: ct
Time: 23 August 2021, 9:21:12.58334 pm
UUID: 6159117b-a67f-bd4a-b30a-82fe1b4abb09
Ancestors: Regex-Core-mt.61

Adds support for unicode backslash atoms.

Some examples:

    ''Squeak is the perfect language'' allRegexMatches: ''\w*\u{61}\w*''. "--> #(''Squeak'' ''language'')"
    ''Squeak is beautiful'' allRegexMatches: ''\w*\x75\w*''. "--> #(''Squeak'' ''beautiful'')
    ''$1.00 = ¬0.85 = £0.73'' allRegexMatches: ''\p{Sc}\d+\.\d+''. "--> (''$1.00'' ''¬0.85'' ''£0.73'')"
    ''Carpe Squeak!'' allRegexMatches: ''\p{L}+''. "--> #(''Carpe'' ''Squeak'')"
    '' get rid of all these nonsense separators'' allRegexMatches: ''\P{Z}+''. "--> (''get'' ''rid'' ''of'' ''all'' ''these'' ''nonsense'' ''separators'')"

Requires Multilingual-ct.259.

=============== Diff against Regex-Core-mt.61 ===============

Item was changed:
Object subclass: #RxParser
    instanceVariableNames: ''input lookahead''
+     classVariableNames: ''BackslashConstants BackslashSpecials HexDigits''
-     classVariableNames: ''BackslashConstants BackslashSpecials''
    poolDictionaries: ''''
    category: ''Regex-Core''!

!RxParser commentStamp: ''Tbn 11/12/2010 23:13'' prior: 0!
-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
--
The regular expression parser. Translates a regular expression read from a stream into a parse tree. (''accessing'' protocol). The tree can later be passed to a matcher initialization method. All other classes in this category implement the tree. Refer to their comments for any details.

Instance variables:
    input        <Stream> A stream with the regular expression being parsed.
    lookahead    <Character>!

Item was added:
+ ----- Method: RxParser class>>digitsForBase: (in category ''private'') -----
+ digitsForBase: base
+ 
+     ^ ($0 to: $9)
+         , (($a to: $z) take: base - 10)
+         , (($A to: $Z) take: base - 10)!

Item was added:
+ ----- Method: RxParser class>>hexDigits (in category ''constants'') -----
+ hexDigits
+ 
+     ^ HexDigits ifNil: [HexDigits := self digitsForBase: 16]!

Item was changed:
----- Method: RxParser class>>initializeBackslashSpecials (in category ''class initialization'') -----
initializeBackslashSpecials
+     "The keys are characters that normally follow a $\, the values are either associations of classes and initialization selectors on their instance side, or evaluables that will be evaluated on the current parser instance."
-     "Keys are characters that normally follow a \, the values are
-     associations of classes and initialization selectors on the instance side
-     of the cl


Hrmpf.

---
Sent from Squeak Inbox Talk

On 2021-08-23T21:22:57+02:00, christoph.thiede at student.hpi.uni-potsdam.de wrote:

> So unfortunately there was no notification about this version, once again, because I have inserted some too special characters in its summary. As an alternative, let me announce my changes here again:
> 
> 	Name: Regex-Core-ct.68
> 	Author: ct
> 	Time: 23 August 2021, 9:21:12.58334 pm
> 	UUID: 6159117b-a67f-bd4a-b30a-82fe1b4abb09
> 	Ancestors: Regex-Core-mt.61
> 
> 	Adds support for unicode backslash atoms.
> 
> 	Some examples:
> 
> 		'Squeak is the perfect language' allRegexMatches: '\w*\u{61}\w*'. "--> #('Squeak' 'language')"
> 		'Squeak is beautiful' allRegexMatches: '\w*\x75\w*'. "--> #('Squeak' 'beautiful')"
> 		(WebUtils jsonDecode: '"$1.00 = \u20AC0.85 = \u00A30.73"' readStream) allRegexMatches: '\p{Sc}\d+\.\d+'. "--> ('$1.00' '?0.85' '?0.73')"
> 		'Carpe Squeak!' allRegexMatches: '\p{L}+'. "--> #('Carpe' 'Squeak')"
> 		(WebUtils jsonDecode: '" get rid of \u2007all these nonsense separators"' readStream) allRegexMatches: '\P{Z}+'. "--> ('get' 'rid' 'of' 'all' 'these' 'nonsense' 'separators')"
> 
> 	Requires Multilingual-ct.259.
> 
> Tests are in Regex-Tests-Core-ct.24. Looking forward to all your feedback! :-)
> 
> Best,
> Christoph
> 
> ---
> Sent from Squeak Inbox Talk
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210823/ad1b2d86/attachment.html>
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210823/df23d7fd/attachment.html>


More information about the Squeak-dev mailing list