[Seaside] [BUG] in IAHtmlParser
Fri, 29 Mar 2002 14:27:10 +0100
Le Vendredi 29 mars 2002, =E0 07:38 , Julian Fitzell a =E9crit :
> Avi Bryant wrote:
>> On Fri, 29 Mar 2002, Alain Fischer wrote:
>>> Hi Avi,
>>> I have tried to inspect or explore each of the following line:
>>> IAHtmlParser parse: '<span><table></table></span>'
>>> IAHtmlParser parse: '<table><tr></tr></table>'
>> According to the HTML 4.0 spec, <span> can only contain inline tags,=20=
>> so having a <table> inside a <span> is not legal (<div> is the=20
>> intended to contain block tags like <table>). This forces the <span>=20=
>> close before the <table> tag.
>> The HTML parser Julian wrote for Seaside is pretty strictly=20
>> conformant -
>> this lets it be smart about not requiring close tags everywhere, but =
>> does mean that it can do somewhat surprising things with illegal=20
>> One way it could be improved would be to actually throw an error when =
>> tag (like span) that requires a close tag doesn't get one (or, as in=20=
>> case, apparently doesn't).
>> I imagine this cost you some time, and I apologize - if you stick to
>> conformant HTML4, you should be ok in the future.
> Yeah, sorry about that. I wouldn't say the parser is strictly=20
> compliant but it ended up being necessary to make it somewhat=20
> compliant. The reason is that in order to allow all the cases that=20
> people use all the time, we would essentially not be able to support=20=
> valid HTML (even though it is probably never used).
> The problem is, frankly that the HTML spec is insane! There are=20
> ridiculous combinations of only allowing certain tags within others =
> implicitly closing tags for you. This implicit closing is most of the=20=
> reason why I had to enforce some of the rules about what tags can be=20=
> contained inside others. This is why most people never use </p> or=20
> </li> allowing them to be closed implicitly be the next non-inline tag=20=
> (usually the next <p> or <li> in these cases).
> But it sucks. XML is often overused but in this case, HTML so wants =
> be XML anyway I wish browser developers would hurry up and start =
> support for XHTML so I can start writing my webpages with it.
> Again, sorry for the problems. It certainly isn't my goal to have a=20=
> web application server enforce the HTML spec for you (that should be=20=
> the browser's job) but unfortunately a loose spec and loose, loose,=20
> loose browser implementations have made writing a parser rather=20
> difficult. I don't want to implement a complete knowledge of every =
> every tag could be used. :( I tried to keep it loose where possible=20=
> but... what can I say?
I aggre with you that the best thing to do here is to keep the parser as=20=
simple as possible.
Eventually, if I understand well enough your parser, I will try to add=20=
some warning in the case
we found a close tag after the open tag is supposed to be already=20
closed. It seem to me that
a solution like this could be very simple but I must go deeper in my=20
understanding of the parser.