[Seaside] How to think about "Unicode spew"

tty gettimothy at zoho.com
Thu Feb 13 20:27:17 UTC 2020


Hi Folks.

Over at http://menmachinesmaterials.com/WikitextParser  ***

When hitting HamburgerIcon->Database->Random Page   I occasionally get what
I call "Unicode spew"

Here is a portion of a page.
*<�!DOCTYPE html><�html class="no-js" lang="en"
dir="ltr"><�head><�title>WikitextParser<�/title><�meta
charset="utf-8"/><�link rel="stylesheet" type="text/css"
href="/files/WADevelopmentFiles/development.css"/>...*


However, on the image, if I run the page manually, the resulting XMLElement
looks just fine.

Here is the thing that caused the spew.

*<body><p> Thierry IV or Theoderic IV ({{circa}} 720{{spaced ndash}}c. 782)
was a  Frankish <https://www.wikipedia.org/wiki/Franks>   noble. Count of 
Autun <https://www.wikipedia.org/wiki/Autun>   and  Toulouse
<https://www.wikipedia.org/wiki/Toulouse>  ; he was thought to be a son of 
Sigebert V <https://www.wikipedia.org/wiki/Sigebert_V>  , and grandson of 
Sigebert IV of Raze <https://www.wikipedia.org/wiki/Sigebert_IV_of_Raze>  .
It is now well documented that his supposed Davidic blood was a hoax (see 
Priory of Sion <https://www.wikipedia.org/wiki/Priory_of_Sion>  ). Thierry
married  Auda <https://www.wikipedia.org/wiki/Auda_of_France>  , daughter of 
Charles Martel <https://www.wikipedia.org/wiki/Charles_Martel>  , sister of 
Pepin III <https://www.wikipedia.org/wiki/Pepin_III>  .</p>
Children
<ul><li><a
href="https://www.wikipedia.org/wiki/William_of_Gellone">William of
Gellone</a> (755 – 28 May 812/4)</li><li>Alda of Gellone (born ca.
770); married Fredalon</li><li><a
href="https://www.wikipedia.org/wiki/Adalhelm_of_Autun">Adalhelm of
Autun</a></li></ul><p>{{Persondata <div/>| NAME              = Thierry
04| ALTERNATIVE NAMES =| SHORT DESCRIPTION = Frankish noble| DATE OF BIRTH    
=| PLACE OF BIRTH    =| DATE OF DEATH     =| PLACE OF DEATH   
=}}{{DEFAULTSORT:Thierry 04}} Category:720s births
<https://www.wikipedia.org/wiki/Category:720s_births>   Category:780s deaths
<https://www.wikipedia.org/wiki/Category:780s_deaths>   Category:Counts of
Autun <https://www.wikipedia.org/wiki/Category:Counts_of_Autun>  
Category:Counts of Toulouse
<https://www.wikipedia.org/wiki/Category:Counts_of_Toulouse>  
Category:Frankish people
<https://www.wikipedia.org/wiki/Category:Frankish_people> 
</p><p>{{France-noble-stub}}</p></body>*


The method that posts the output is straightforward enough:

*renderParsedOn: html
	| wikiGrammar wikiParser input  actor|

	actor := PEGWikiMediaGeneratorTables new.
	actor transcripton  
		ifTrue:[	Transcript clear].
			
	wikicode isNil
		ifTrue:[input := '== Welcome To WikitextParserBrowser ==']
		ifFalse:[input := wikicode].

	wikiGrammar := PEGParser grammarWikiMediaTables reading positioning. 
	wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar
actor: PEGParserParser new.
	[[output := wikiParser parse: 'Page' stream: input actor: actor. ]
		on: Error
		do:[:ex | output := '
 Error parsing. see Wikicode tab for source 
']]
			ensure:[
					output := ((output  asString copyReplaceAll: '<body>' with:'' )
copyReplaceTokens:'</body>' with:'') .
					output := (output  asString copyReplaceAll: '>' with:'>'
asTokens:false).
					output := (output  asString copyReplaceAll: '<' with:'<'
asTokens:false)].
	html break;break.								
	html html: output.

*

Is there something I should be doing to "output" to make the garbage go
away?

thanks in advance
*** Alpha/Beta dev tool. If you get a DNU just hit the back button and try
again. Please do not hit Debug (:



--
Sent from: http://forum.world.st/Seaside-General-f86180.html


More information about the seaside mailing list