[squeak-dev] The Inbox: Regex-Core-tobe.62.mcz

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Mon Oct 18 12:27:30 UTC 2021


Hi Tom,


this looks similar to Regex-Core-ct.68. :-) I will compare both patches later in-depth, maybe we can merge the best of both approaches.


(For the future, maybe someone should build a tiny tool that automatically warns you when you start editing a class/protocol/method for that there already open patches in The Inbox ... :D)


Best,

Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von commits at source.squeak.org <commits at source.squeak.org>
Gesendet: Montag, 18. Oktober 2021 12:55:02
An: squeak-dev at lists.squeakfoundation.org
Betreff: [squeak-dev] The Inbox: Regex-Core-tobe.62.mcz

A new version of Regex-Core was added to project The Inbox:
http://source.squeak.org/inbox/Regex-Core-tobe.62.mcz

==================== Summary ====================

Name: Regex-Core-tobe.62
Author: tobe
Time: 18 October 2021, 12:55:00.939602 pm
UUID: 02286bc7-4450-4843-9988-40b1dc9bfa70
Ancestors: Regex-Core-mt.61

Add support for \uXXXX for specifying unicode code points

=============== Diff against Regex-Core-mt.61 ===============

Item was changed:
  ----- Method: RxCharSetParser>>parseEscapeChar (in category 'parsing') -----
  parseEscapeChar

+        | first last |
-        | first |
         self match: $\.
         first := (RxsPredicate forEscapedLetter: lookahead)
+                ifNil: [
+                         (lookahead = $u and: [RxsPredicate matchesUnicodeSymbol: (source peek: 4)])
+                                ifTrue: [RxsCharacter with: (RxsPredicate unicodeCharacterFrom: self)]
+                                ifFalse: [RxsCharacter with: lookahead]].
-                ifNil: [ RxsCharacter with: lookahead ].
         self next == $- ifFalse: [^ elements add: first].
         self next ifNil: [
                 elements add: first.
                 ^ self addChar: $-].
+        last := lookahead = $\
+                ifTrue: [
+                        self next.
+                        (RxsPredicate forEscapedLetter: lookahead)
+                                ifNil: [
+                                         (lookahead = $u and: [RxsPredicate matchesUnicodeSymbol: (source peek: 4)])
+                                                ifTrue: [RxsCharacter with: (RxsPredicate unicodeCharacterFrom: self)]
+                                                ifFalse: [RxsCharacter with: lookahead]]]
+                ifFalse: [ | char |
+                        char := RxsCharacter with: lookahead.
+                        self next.
+                        char].
+        self addRangeFrom: first character to: last character!
-        self addRangeFrom: first character to: lookahead.
-        self next!

Item was changed:
  ----- Method: RxParser>>ifSpecial:then: (in category 'private') -----
  ifSpecial: aCharacter then: aBlock
         "If the character is such that it defines a special node when follows a $\,
         then create that node and evaluate aBlock with the node as the parameter.
         Otherwise just return."

         | classAndSelector |
+        classAndSelector := BackslashSpecials at: aCharacter ifAbsent: [
+                " check if we have four hex digits for a unicode code symbol following "
+                (aCharacter = $u and: [RxsPredicate matchesUnicodeSymbol: (input peek: 4)]) ifTrue: [
+                        ^ aBlock value: (RxsPredicate forUnicodeFrom: self)].
+                ^self].
-        classAndSelector := BackslashSpecials at: aCharacter ifAbsent: [^self].
         ^aBlock value: (classAndSelector key new perform: classAndSelector value)!

Item was added:
+ ----- Method: RxsPredicate class>>forUnicodeFrom: (in category 'instance creation') -----
+ forUnicodeFrom: aParser
+
+        ^RxsPredicate new beCharacter: (self unicodeCharacterFrom: aParser)!

Item was added:
+ ----- Method: RxsPredicate class>>matchesUnicodeSymbol: (in category 'helper') -----
+ matchesUnicodeSymbol: aString
+
+        ^aString size = 4 and: [aString allSatisfy: [:c | c asLowercase between: $0 and: $f]]!

Item was added:
+ ----- Method: RxsPredicate class>>unicodeCharacterFrom: (in category 'helper') -----
+ unicodeCharacterFrom: aParser
+
+        ^Character value: (Integer readFrom: aParser next asString, aParser next, aParser next, aParser next base: 16)!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211018/a93d71cb/attachment.html>


More information about the Squeak-dev mailing list