[squeak-dev] getting the XMLHTMLParser to stop fixing things

gettimothy gettimothy at zoho.com
Fri Nov 26 13:52:12 UTC 2021


Hi Folks,



I was hoping I could use Mont's XMLHTMLParser to detect errors for me in my PEG parsing. Unfortunately, it fixes stuff!



Here, I have a missing </body> on the input stream



|document ios|



ios := ReadStream on: '<html>

<head>

</head>

  <body>

<p>Dude</p>

</html>'.

document := XMLHTMLParser parse: ios.

document inspect.




and the document shows it as fixed.



What I would like is for it to throw an error when content shows up outside any tags within the body tag.



For example this should pass: 

'<html

  <body>

      <p>Dude</p>

     </body>

</html>'.





this should not



'<html

  <body>

      should throw an error

      <p>Dude</p>

     </body>

</html>'.






Now, the XMLDOMParser throws a XMLWellFormednessException on any html, which I could use IF I can figure out how to get the XMLDOMParser to only barf on broken xHTML and display valid xHTML.





Hints appreciated.



thx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211126/abb0bc06/attachment.html>


More information about the Squeak-dev mailing list