[Seaside] [BUG] in IAHtmlParser

Julian Fitzell julian@beta4.com
Thu, 28 Mar 2002 22:38:05 -0800


Avi Bryant wrote:
> On Fri, 29 Mar 2002, Alain Fischer wrote:
> 
> 
>>Hi Avi,
>>
>>I have tried to inspect or explore each of the following line:
>>
>>IAHtmlParser parse: '<span><table></table></span>'
>>IAHtmlParser parse: '<table><tr></tr></table>'
> 
> 
> According to the HTML 4.0 spec, <span> can only contain inline tags, and
> so having a <table> inside a <span> is not legal (<div> is the equivalent
> intended to contain block tags like <table>).  This forces the <span> to
> close before the <table> tag.
> 
> The HTML parser Julian wrote for Seaside is pretty strictly conformant -
> this lets it be smart about not requiring close tags everywhere, but it
> does mean that it can do somewhat surprising things with illegal markup.
> One way it could be improved would be to actually throw an error when a
> tag (like span) that requires a close tag doesn't get one (or, as in this
> case, apparently doesn't).
> 
> I imagine this cost you some time, and I apologize - if you stick to
> conformant HTML4, you should be ok in the future.

Yeah, sorry about that.  I wouldn't say the parser is strictly compliant 
but it ended up being necessary to make it somewhat compliant.  The 
reason is that in order to allow all the cases that people use all the 
time, we would essentially not be able to support valid HTML (even 
though it is probably never used).

The problem is, frankly that the HTML spec is insane!  There are 
ridiculous combinations of only allowing certain tags within others and 
implicitly closing tags for you.  This implicit closing is most of the 
reason why I had to enforce some of the rules about what tags can be 
contained inside others.  This is why most people never use </p> or 
</li> allowing them to be closed implicitly be the next non-inline tag 
(usually the next <p> or <li> in these cases).

But it sucks.  XML is often overused but in this case, HTML so wants to 
be XML anyway I wish browser developers would hurry up and start adding 
support for XHTML so I can start writing my webpages with it.

Again, sorry for the problems.  It certainly isn't my goal to have a web 
application server enforce the HTML spec for you (that should be the 
browser's job) but unfortunately a loose spec and loose, loose, loose 
browser implementations have made writing a parser rather difficult.  I 
don't want to implement a complete knowledge of every way every tag 
could be used. :(  I tried to keep it loose where possible but...  what 
can I say?

Julian

-- 
julian@beta4.com
Beta4 Productions (http://www.beta4.com)