[squeak-dev] getting the XMLHTMLParser to stop fixing things
gettimothy
gettimothy at zoho.com
Fri Nov 26 13:52:12 UTC 2021
Hi Folks,
I was hoping I could use Mont's XMLHTMLParser to detect errors for me in my PEG parsing. Unfortunately, it fixes stuff!
Here, I have a missing </body> on the input stream
|document ios|
ios := ReadStream on: '<html>
<head>
</head>
<body>
<p>Dude</p>
</html>'.
document := XMLHTMLParser parse: ios.
document inspect.
and the document shows it as fixed.
What I would like is for it to throw an error when content shows up outside any tags within the body tag.
For example this should pass:
'<html
<body>
<p>Dude</p>
</body>
</html>'.
this should not
'<html
<body>
should throw an error
<p>Dude</p>
</body>
</html>'.
Now, the XMLDOMParser throws a XMLWellFormednessException on any html, which I could use IF I can figure out how to get the XMLDOMParser to only barf on broken xHTML and display valid xHTML.
Hints appreciated.
thx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211126/abb0bc06/attachment.html>
More information about the Squeak-dev
mailing list
|