[Seaside] [BUG] in IAHtmlParser

Alain Fischer alain.fischer@bluewin.ch
Fri, 29 Mar 2002 14:27:10 +0100


Le Vendredi 29 mars 2002, =E0 07:38 , Julian Fitzell a =E9crit :

> Avi Bryant wrote:
>> On Fri, 29 Mar 2002, Alain Fischer wrote:
>>> Hi Avi,
>>>
>>> I have tried to inspect or explore each of the following line:
>>>
>>> IAHtmlParser parse: '<span><table></table></span>'
>>> IAHtmlParser parse: '<table><tr></tr></table>'
>> According to the HTML 4.0 spec, <span> can only contain inline tags,=20=

>> and
>> so having a <table> inside a <span> is not legal (<div> is the=20
>> equivalent
>> intended to contain block tags like <table>).  This forces the <span>=20=

>> to
>> close before the <table> tag.
>> The HTML parser Julian wrote for Seaside is pretty strictly=20
>> conformant -
>> this lets it be smart about not requiring close tags everywhere, but =
it
>> does mean that it can do somewhat surprising things with illegal=20
>> markup.
>> One way it could be improved would be to actually throw an error when =
a
>> tag (like span) that requires a close tag doesn't get one (or, as in=20=

>> this
>> case, apparently doesn't).
>> I imagine this cost you some time, and I apologize - if you stick to
>> conformant HTML4, you should be ok in the future.
>
> Yeah, sorry about that.  I wouldn't say the parser is strictly=20
> compliant but it ended up being necessary to make it somewhat=20
> compliant.  The reason is that in order to allow all the cases that=20
> people use all the time, we would essentially not be able to support=20=

> valid HTML (even though it is probably never used).
>
> The problem is, frankly that the HTML spec is insane!  There are=20
> ridiculous combinations of only allowing certain tags within others =
and=20
> implicitly closing tags for you.  This implicit closing is most of the=20=

> reason why I had to enforce some of the rules about what tags can be=20=

> contained inside others.  This is why most people never use </p> or=20
> </li> allowing them to be closed implicitly be the next non-inline tag=20=

> (usually the next <p> or <li> in these cases).
>
> But it sucks.  XML is often overused but in this case, HTML so wants =
to=20
> be XML anyway I wish browser developers would hurry up and start =
adding=20
> support for XHTML so I can start writing my webpages with it.
>
> Again, sorry for the problems.  It certainly isn't my goal to have a=20=

> web application server enforce the HTML spec for you (that should be=20=

> the browser's job) but unfortunately a loose spec and loose, loose,=20
> loose browser implementations have made writing a parser rather=20
> difficult.  I don't want to implement a complete knowledge of every =
way=20
> every tag could be used. :(  I tried to keep it loose where possible=20=

> but...  what can I say?

I aggre with you that the best thing to do here is to keep the parser as=20=

simple as possible.
Eventually, if I understand well enough your parser, I will try to add=20=

some warning in the case
we found a close tag after the open tag is supposed to be already=20
closed. It seem to me that
a solution like this could be very simple but I must go deeper in my=20
understanding of the parser.