[Newbies] How to approach this PEGParser Grammar Fix.

gettimothy gettimothy at zoho.com
Wed Sep 4 20:41:11 UTC 2019


Hi Folks,



In XTreams parsing, the grammarWiki/PEGWikiGenerator combo do not parse the Wikimedia headings.

 I have copied the grammarWiki to grammarWikiMedia and I am slowly building it up in an attempt to isolate the problem.

Grammar looks like this:





grammarWikiMedia

"

http://en.wikipedia.org/wiki/Help:Wiki_markup

"

^

'Page <- (Heading)*



LineCharacter <- [^\n]

Flow <- Escape / Bold / Italic / LinkShort / LinkFull / LineCharacter

Escape <- "**" / "__" / "[["

Bold <- "*" Flow{"*"}

Italic <- "_" Flow{"_"}

LinkShort <- "[" .{&[>\]]} "]"

LinkFull <- "[" Flow{">"} .{"]"}



Line <- Flow{1,"\n"}

Paragraph <- Line

Empty <- "\n"

Whitespace <- [\t\s]*



Heading		<-	Heading6 /  Heading5 / Heading4 / Heading3 / Heading2 / Heading1

Heading1	<-	Whitespace "= " Flow{" ="}

Heading2	<-	Whitespace "== " Flow{" =="}

Heading3	<-	Whitespace "=== " Flow{" ==="}

Heading4	<-	Whitespace "==== " Flow{" ===="}

Heading5	<-	Whitespace "===== " Flow{" ====="}

Heading6	<-	Whitespace "====== " Flow{" ======"}



'








For the Actor, I have copied the PEGWikiGenerator, saving it as PEGWikiMediaGenerator. I have made some minor additions to support H5 and H6 heading levels per Wikimedia standards.

My problem, is that Wikimedia seems to like to wrap its <hN></hN> tags within a paragraph: <p><hN></hN></p>

So, while I can parse this input just ducky:




| wikiGrammar wikiParser input output | 

wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."

wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."

input := ' = Heading 1 =  == Heading 2 == === Heading 3 === ==== Heading 4 ==== ===== Heading 5 ===== ====== Heading 6 ======'.

output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."

output inspect.


Producing an XMLElement looking like this:





<div><h1>Heading 1</h1><h2>Heading 2</h2><h3>Heading 3</h3><h4>Heading 4</h4><h5>Heading 5</h5><h6>Heading 6</h6></div>



When I wrap the <hn> elements in <p> tags for this input...





| wikiGrammar wikiParser input output | 

wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."

wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."

input := '<p>= Heading 1 =</p>  <p>== Heading 2 ==</p> <p>=== Heading 3 ===</p> <p>==== Heading 4 ====</p> <p>===== Heading 5 =====</p> <p> ====== Heading 6 ======</p>'.

output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."

output inspect.


my XMLElement looks like this:



<div/>





I am supposing that I have a wayward Grammar specification.



Where should I focus?



Should I hack at Heading1	<-	Whitespace "= " Flow{" ="}




and change "Whitespace" to something else ?



Or should I redefine the 



Line <- Flow{1,"\n"}

Paragraph <- Line







duo?



If a general principle exists that will guide me going forward, I would very much appreciate it.



Thank you in advance.



t
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/beginners/attachments/20190904/3a506abf/attachment.html>


More information about the Beginners mailing list