[Newbies] How to approach this PEGParser Grammar Fix.
gettimothy
gettimothy at zoho.com
Wed Sep 4 20:41:11 UTC 2019
Hi Folks,
In XTreams parsing, the grammarWiki/PEGWikiGenerator combo do not parse the Wikimedia headings.
I have copied the grammarWiki to grammarWikiMedia and I am slowly building it up in an attempt to isolate the problem.
Grammar looks like this:
grammarWikiMedia
"
http://en.wikipedia.org/wiki/Help:Wiki_markup
"
^
'Page <- (Heading)*
LineCharacter <- [^\n]
Flow <- Escape / Bold / Italic / LinkShort / LinkFull / LineCharacter
Escape <- "**" / "__" / "[["
Bold <- "*" Flow{"*"}
Italic <- "_" Flow{"_"}
LinkShort <- "[" .{&[>\]]} "]"
LinkFull <- "[" Flow{">"} .{"]"}
Line <- Flow{1,"\n"}
Paragraph <- Line
Empty <- "\n"
Whitespace <- [\t\s]*
Heading <- Heading6 / Heading5 / Heading4 / Heading3 / Heading2 / Heading1
Heading1 <- Whitespace "= " Flow{" ="}
Heading2 <- Whitespace "== " Flow{" =="}
Heading3 <- Whitespace "=== " Flow{" ==="}
Heading4 <- Whitespace "==== " Flow{" ===="}
Heading5 <- Whitespace "===== " Flow{" ====="}
Heading6 <- Whitespace "====== " Flow{" ======"}
'
For the Actor, I have copied the PEGWikiGenerator, saving it as PEGWikiMediaGenerator. I have made some minor additions to support H5 and H6 heading levels per Wikimedia standards.
My problem, is that Wikimedia seems to like to wrap its <hN></hN> tags within a paragraph: <p><hN></hN></p>
So, while I can parse this input just ducky:
| wikiGrammar wikiParser input output |
wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."
input := ' = Heading 1 = == Heading 2 == === Heading 3 === ==== Heading 4 ==== ===== Heading 5 ===== ====== Heading 6 ======'.
output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."
output inspect.
Producing an XMLElement looking like this:
<div><h1>Heading 1</h1><h2>Heading 2</h2><h3>Heading 3</h3><h4>Heading 4</h4><h5>Heading 5</h5><h6>Heading 6</h6></div>
When I wrap the <hn> elements in <p> tags for this input...
| wikiGrammar wikiParser input output |
wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."
input := '<p>= Heading 1 =</p> <p>== Heading 2 ==</p> <p>=== Heading 3 ===</p> <p>==== Heading 4 ====</p> <p>===== Heading 5 =====</p> <p> ====== Heading 6 ======</p>'.
output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."
output inspect.
my XMLElement looks like this:
<div/>
I am supposing that I have a wayward Grammar specification.
Where should I focus?
Should I hack at Heading1 <- Whitespace "= " Flow{" ="}
and change "Whitespace" to something else ?
Or should I redefine the
Line <- Flow{1,"\n"}
Paragraph <- Line
duo?
If a general principle exists that will guide me going forward, I would very much appreciate it.
Thank you in advance.
t
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/beginners/attachments/20190904/3a506abf/attachment.html>
More information about the Beginners
mailing list