[Newbies] How to approach this PEGParser Grammar Fix.

Wed Sep 4 20:41:11 UTC 2019

Hi Folks,

In XTreams parsing, the grammarWiki/PEGWikiGenerator combo do not parse the Wikimedia headings.

 I have copied the grammarWiki to grammarWikiMedia and I am slowly building it up in an attempt to isolate the problem.

Grammar looks like this:

grammarWikiMedia

"

http://en.wikipedia.org/wiki/Help:Wiki_markup

"

^

'Page <- (Heading)*

LineCharacter <- [^\n]

Flow <- Escape / Bold / Italic / LinkShort / LinkFull / LineCharacter

Escape <- "**" / "__" / "[["

Bold <- "*" Flow{"*"}

Italic <- "_" Flow{"_"}

LinkShort <- "[" .{&[>\]]} "]"

LinkFull <- "[" Flow{">"} .{"]"}

Line <- Flow{1,"\n"}

Paragraph <- Line

Empty <- "\n"

Whitespace <- [\t\s]*

Heading		<-	Heading6 /  Heading5 / Heading4 / Heading3 / Heading2 / Heading1

Heading1	<-	Whitespace "= " Flow{" ="}

Heading2	<-	Whitespace "== " Flow{" =="}

Heading3	<-	Whitespace "=== " Flow{" ==="}

Heading4	<-	Whitespace "==== " Flow{" ===="}

Heading5	<-	Whitespace "===== " Flow{" ====="}

Heading6	<-	Whitespace "====== " Flow{" ======"}

'

For the Actor, I have copied the PEGWikiGenerator, saving it as PEGWikiMediaGenerator. I have made some minor additions to support H5 and H6 heading levels per Wikimedia standards.

My problem, is that Wikimedia seems to like to wrap its <hN></hN> tags within a paragraph: <p><hN></hN></p>

So, while I can parse this input just ducky:

| wikiGrammar wikiParser input output | 

wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."

wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."

input := ' = Heading 1 =  == Heading 2 == === Heading 3 === ==== Heading 4 ==== ===== Heading 5 ===== ====== Heading 6 ======'.

output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."

output inspect.

Producing an XMLElement looking like this:

<div><h1>Heading 1</h1><h2>Heading 2</h2><h3>Heading 3</h3><h4>Heading 4</h4><h5>Heading 5</h5><h6>Heading 6</h6></div>

When I wrap the <hn> elements in <p> tags for this input...

| wikiGrammar wikiParser input output | 

wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."

wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."

input := '<p>= Heading 1 =</p>  <p>== Heading 2 ==</p> <p>=== Heading 3 ===</p> <p>==== Heading 4 ====</p> <p>===== Heading 5 =====</p> <p> ====== Heading 6 ======</p>'.

output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."

output inspect.

my XMLElement looks like this:

<div/>

I am supposing that I have a wayward Grammar specification.

Where should I focus?

Should I hack at Heading1	<-	Whitespace "= " Flow{" ="}

and change "Whitespace" to something else ?

Or should I redefine the 

Line <- Flow{1,"\n"}

Paragraph <- Line

duo?

If a general principle exists that will guide me going forward, I would very much appreciate it.

Thank you in advance.

t
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/beginners/attachments/20190904/3a506abf/attachment.html>