[Seaside] How to think about "Unicode spew"

Karsten Kusche karsten at heeg.de
Fri Feb 14 14:01:13 UTC 2020


That doesn’t look like an encoding problem. The only places where you have these question marks is right behind a <. Try to look at the source with a hex-editor to identify the actual character that’s placed behind <. My guess would be character 0 or something similar.

Karsten


Georg Heeg eK
Wallstraße 22
06366 Köthen

Tel.: 03496/214328
FAX: 03496/214712
Amtsgericht Dortmund HRA 12812


Am 13. Februar 2020 um 21:27:18, tty (gettimothy at zoho.com<mailto:gettimothy at zoho.com>) schrieb:

Hi Folks.

Over at http://menmachinesmaterials.com/WikitextParser ***

When hitting HamburgerIcon->Database->Random Page I occasionally get what
I call "Unicode spew"

Here is a portion of a page.
*<�!DOCTYPE html><�html class="no-js" lang="en"
dir="ltr"><�head><�title>WikitextParser<�/title><�meta
charset="utf-8"/><�link rel="stylesheet" type="text/css"
href="/files/WADevelopmentFiles/development.css"/>...*


However, on the image, if I run the page manually, the resulting XMLElement
looks just fine.

Here is the thing that caused the spew.

*<body><p> Thierry IV or Theoderic IV ({{circa}} 720{{spaced ndash}}c. 782)
was a Frankish <https://www.wikipedia.org/wiki/Franks> noble. Count of
Autun <https://www.wikipedia.org/wiki/Autun> and Toulouse
<https://www.wikipedia.org/wiki/Toulouse> ; he was thought to be a son of
Sigebert V <https://www.wikipedia.org/wiki/Sigebert_V> , and grandson of
Sigebert IV of Raze <https://www.wikipedia.org/wiki/Sigebert_IV_of_Raze> .
It is now well documented that his supposed Davidic blood was a hoax (see
Priory of Sion <https://www.wikipedia.org/wiki/Priory_of_Sion> ). Thierry
married Auda <https://www.wikipedia.org/wiki/Auda_of_France> , daughter of
Charles Martel <https://www.wikipedia.org/wiki/Charles_Martel> , sister of
Pepin III <https://www.wikipedia.org/wiki/Pepin_III> .</p>
Children
<ul><li><a
href="https://www.wikipedia.org/wiki/William_of_Gellone">William of
Gellone</a> (755 – 28 May 812/4)</li><li>Alda of Gellone (born ca.
770); married Fredalon</li><li><a
href="https://www.wikipedia.org/wiki/Adalhelm_of_Autun">Adalhelm of
Autun</a></li></ul><p>{{Persondata <div/>| NAME = Thierry
04| ALTERNATIVE NAMES =| SHORT DESCRIPTION = Frankish noble| DATE OF BIRTH
=| PLACE OF BIRTH =| DATE OF DEATH =| PLACE OF DEATH
=}}{{DEFAULTSORT:Thierry 04}} Category:720s births
<https://www.wikipedia.org/wiki/Category:720s_births> Category:780s deaths
<https://www.wikipedia.org/wiki/Category:780s_deaths> Category:Counts of
Autun <https://www.wikipedia.org/wiki/Category:Counts_of_Autun>
Category:Counts of Toulouse
<https://www.wikipedia.org/wiki/Category:Counts_of_Toulouse>
Category:Frankish people
<https://www.wikipedia.org/wiki/Category:Frankish_people>
</p><p>{{France-noble-stub}}</p></body>*


The method that posts the output is straightforward enough:

*renderParsedOn: html
| wikiGrammar wikiParser input actor|

actor := PEGWikiMediaGeneratorTables new.
actor transcripton
ifTrue:[ Transcript clear].

wikicode isNil
ifTrue:[input := '== Welcome To WikitextParserBrowser ==']
ifFalse:[input := wikicode].

wikiGrammar := PEGParser grammarWikiMediaTables reading positioning.
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar
actor: PEGParserParser new.
[[output := wikiParser parse: 'Page' stream: input actor: actor. ]
on: Error
do:[:ex | output := '
Error parsing. see Wikicode tab for source
']]
ensure:[
output := ((output asString copyReplaceAll: '<body>' with:'' )
copyReplaceTokens:'</body>' with:'') .
output := (output asString copyReplaceAll: '>' with:'>'
asTokens:false).
output := (output asString copyReplaceAll: '<' with:'<'
asTokens:false)].
html break;break.
html html: output.

*

Is there something I should be doing to "output" to make the garbage go
away?

thanks in advance
*** Alpha/Beta dev tool. If you get a DNU just hit the back button and try
again. Please do not hit Debug (:



--
Sent from: http://forum.world.st/Seaside-General-f86180.html
_______________________________________________
seaside mailing list
seaside at lists.squeakfoundation.org
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/seaside/attachments/20200214/a5e334f7/attachment.html>


More information about the seaside mailing list