[Seaside] How to think about "Unicode spew"
tty
gettimothy at zoho.com
Thu Feb 13 20:27:17 UTC 2020
Hi Folks.
Over at http://menmachinesmaterials.com/WikitextParser ***
When hitting HamburgerIcon->Database->Random Page I occasionally get what
I call "Unicode spew"
Here is a portion of a page.
*<�!DOCTYPE html><�html class="no-js" lang="en"
dir="ltr"><�head><�title>WikitextParser<�/title><�meta
charset="utf-8"/><�link rel="stylesheet" type="text/css"
href="/files/WADevelopmentFiles/development.css"/>...*
However, on the image, if I run the page manually, the resulting XMLElement
looks just fine.
Here is the thing that caused the spew.
*<body><p> Thierry IV or Theoderic IV ({{circa}} 720{{spaced ndash}}c. 782)
was a Frankish <https://www.wikipedia.org/wiki/Franks> noble. Count of
Autun <https://www.wikipedia.org/wiki/Autun> and Toulouse
<https://www.wikipedia.org/wiki/Toulouse> ; he was thought to be a son of
Sigebert V <https://www.wikipedia.org/wiki/Sigebert_V> , and grandson of
Sigebert IV of Raze <https://www.wikipedia.org/wiki/Sigebert_IV_of_Raze> .
It is now well documented that his supposed Davidic blood was a hoax (see
Priory of Sion <https://www.wikipedia.org/wiki/Priory_of_Sion> ). Thierry
married Auda <https://www.wikipedia.org/wiki/Auda_of_France> , daughter of
Charles Martel <https://www.wikipedia.org/wiki/Charles_Martel> , sister of
Pepin III <https://www.wikipedia.org/wiki/Pepin_III> .</p>
Children
<ul><li><a
href="https://www.wikipedia.org/wiki/William_of_Gellone">William of
Gellone</a> (755 – 28 May 812/4)</li><li>Alda of Gellone (born ca.
770); married Fredalon</li><li><a
href="https://www.wikipedia.org/wiki/Adalhelm_of_Autun">Adalhelm of
Autun</a></li></ul><p>{{Persondata <div/>| NAME = Thierry
04| ALTERNATIVE NAMES =| SHORT DESCRIPTION = Frankish noble| DATE OF BIRTH
=| PLACE OF BIRTH =| DATE OF DEATH =| PLACE OF DEATH
=}}{{DEFAULTSORT:Thierry 04}} Category:720s births
<https://www.wikipedia.org/wiki/Category:720s_births> Category:780s deaths
<https://www.wikipedia.org/wiki/Category:780s_deaths> Category:Counts of
Autun <https://www.wikipedia.org/wiki/Category:Counts_of_Autun>
Category:Counts of Toulouse
<https://www.wikipedia.org/wiki/Category:Counts_of_Toulouse>
Category:Frankish people
<https://www.wikipedia.org/wiki/Category:Frankish_people>
</p><p>{{France-noble-stub}}</p></body>*
The method that posts the output is straightforward enough:
*renderParsedOn: html
| wikiGrammar wikiParser input actor|
actor := PEGWikiMediaGeneratorTables new.
actor transcripton
ifTrue:[ Transcript clear].
wikicode isNil
ifTrue:[input := '== Welcome To WikitextParserBrowser ==']
ifFalse:[input := wikicode].
wikiGrammar := PEGParser grammarWikiMediaTables reading positioning.
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar
actor: PEGParserParser new.
[[output := wikiParser parse: 'Page' stream: input actor: actor. ]
on: Error
do:[:ex | output := '
Error parsing. see Wikicode tab for source
']]
ensure:[
output := ((output asString copyReplaceAll: '<body>' with:'' )
copyReplaceTokens:'</body>' with:'') .
output := (output asString copyReplaceAll: '>' with:'>'
asTokens:false).
output := (output asString copyReplaceAll: '<' with:'<'
asTokens:false)].
html break;break.
html html: output.
*
Is there something I should be doing to "output" to make the garbage go
away?
thanks in advance
*** Alpha/Beta dev tool. If you get a DNU just hit the back button and try
again. Please do not hit Debug (:
--
Sent from: http://forum.world.st/Seaside-General-f86180.html
More information about the seaside
mailing list