[squeak-dev] follow up to previous email. work in progress on "terse guide to XTreams Parsing Syntax" cheat sheet.

gettimothy gettimothy at zoho.com
Tue Jan 5 11:56:07 UTC 2021


Hi Levente.



Regarding my previous letter, I am almost certain that "consume" means a match has been made and eat the string.

"Yield" means  "invoke the callback" or "return the section of a AST for that rule.



anyhoo, I have been going through your old emails and cross checking with:https://nim-lang.org/docs/pegs.html

and have come up with a preliminary "terse guide" to the syntax.



I would like to expand this into tests/examples for others to use going forward. Eventually this will make its way to a SqueakBook when/if I get some time.



Anyhoo, if you could peruse it and see if anything jumps out at you that is incorrect/incomplete. 



Much appreciated.



t


p.s. I have cc'd squeak-dev in case anybody else finds this interesting.









XTreams

https://code.google.com/archive/p/xtreams/wikis/Parsing.wiki

https://nim-lang.org/docs/pegs.html





A  < - E 

Rule:

Bind the expression  E  to the nonterminal symbol  A .

Left recursive rules are not possible and crash the matching engine.



\ddd 

Character with decimal code ddd



\" , etc

Literal  " , etc.

PEG: Literal				<-	QUOTE LiteralEntity{QUOTE}/	DOUBLE_QUOTE LiteralEntity{DOUBLE_QUOTE}









A ... Z 

Sequence:

Apply expressions  A , ...,  Z , in this order, to consume consecutive portions of the text ahead, as long as they succeed.

Indicate success if all succeeded.

Otherwise do not consume any text and indicate failure.

The sequence's precedence is higher than that of Ordered Choice:  A B / C  means  (A B) / Z  and not  A (B / Z) .



A / ... / Z 

Ordered Choice:

Apply expressions  A , ...,  Z , in this order, to the text ahead, until one of them succeeds and possibly consumes some text. Indicate success if one of expressions succeeded.

Otherwise do not consume any text and indicate failure.

The Ordered Choice precedence is lower than that of Sequence:  A B / C  means  (A B) / Z  and not  A (B / Z) .



(E) 

Grouping:

Parenthesis can be used to change operator priority.

(A B) / Z  vs.  A (B / Z) .





{E}    

Cardinality:  Stop Expression

A <- B{C}

to accept A,  means, accept any number of B up until E comes.

Consume E too, but don't yield it.

So, such expression accepts: BE, BBE, BBBE, BBBBE, etc, and yields B, BB, BBB, BBBB, etc.



A <- B{1,"\n"}

means that a A consists of one or more Bs.

The parser will  read B's up until "\n" appears on the stream, which is a carriage return

character.



E?

Cardinality:

Zero or One  E





E* 

Cardinality

Zero or more: E

Apply expression  E  repeatedly to match the text ahead, as long as it succeeds.

Consume the matched text (if any).

Always indicate success.



E*

Cardinality:

Matches zero or more E.





E+

Cardinality:

Matches one or more E

Apply expression  E  repeatedly to match the text ahead, as long as it succeeds.

Consume the matched text (if any) and indicate success if there was at least one match.

Otherwise indicate failure.



E{m}

Cardinality:

Matches m repetitions of E.

B{3}, which is a shorthand for BBB.

B{E} means, accept any number of B up until E comes.

Consume E too, but don't yield it.

So, such expression accepts: BE, BBE, BBBE, BBBBE, etc, and yields B, BB, BBB, BBBB, etc.





E{m,n}

Cardinality:

Matches from m to n repetitions of E.

B{1,3} means B 1 to 3 times, so it accepts B, BB, and BBB.



[A-Za-z]+

Cardinality:

EXAMPLE: Matches one or more alphabetical characters.





$ 

Anchor:

Matches at the end of the input.

No character is consumed. Same as  !. .



!. = $

Anchor:

Matches at the end of the input.

No character is consumed. Same as  $



^ 

Anchor: Matches at the start of the input.

No character is consumed.



&E 

And predicate:

Indicate success if expression  E  matches the text ahead;

otherwise indicate failure.

Do not consume any text.



!E 

Not predicate:

Indicate failure if expression E matches the text ahead;

otherwise indicate success.

Do not consume any text.





[s] 

Character class:

If the character ahead appears in the string  s , consume it and indicate success.

Otherwise indicate failure.



[a-b] 

Character range:

If the character ahead is one from the range  a  through  b , consume it and indicate success.

Otherwise indicate failure.



's' 

String:

If the text ahead is the string  s , consume it and indicate success.

Otherwise indicate failure.





. 

Any character:

If there is a character ahead, consume it and indicate success.

Otherwise (that is, at the end of input) indicate failure.







"BELOW PROBABLY NOT IN PEG"

_ 

Any Unicode character:

If there is an UTF-8 character ahead, consume it and indicate success.

Otherwise indicate failure.



@E 

Search:

Shorthand for  (!E .)* E .

(Search loop for the pattern  E .)





{@} E 

Captured Search:

Shorthand for  {(!E .)*} E .

(Search loop for the pattern  E .) Everything until and exluding  E  is captured.



@@ E 

Same as  {@} E .



\identifier 

Built-in macro for a longer expression.







$i 

Back reference to the  i th capture.  i  counts from 1.





i's' 

String match ignoring case.



y's' 

String match ignoring style.



v's' 

Verbatim string match: Use this to override a global  \i  or  \y  modifier.



i$j 

String match ignoring case for back reference.



y$j 

String match ignoring style for back reference.



v$j 

Verbatim string match for back reference.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210105/ee57cf00/attachment.html>


More information about the Squeak-dev mailing list