[Seaside] [BUG] in IAHtmlParser

Avi Bryant avi@beta4.com
Fri, 29 Mar 2002 12:24:01 -0800 (PST)


On Fri, 29 Mar 2002, Damon Anderson wrote:

> It didn't try to do any interpretation of the spec whatsoever, and so
> the resulting tree reflected the HTML document exactly, not the
> nesting structure specified by the spec.
>
> It used a tag stack, but it handled lone tags differently: it just left
> them there in the stream, including dangling close tags. This meant that
> your tree potentially looked odd, but it also meant that when re-
> generating the source you'd get back the same broken HTML (including
> whitespace in most cases), which is the important thing, IMO.

Hmm.  Although I agree that outputting the same HTML you get in is
important, I'm not sure that leaving lone tags dangling is the best way to
do it.  In a case like "<p>foo<p>bar", the structure of the document
really should be treated as "<p>foo</p><p>bar</p>", not
"<p></p>foo<p></p>bar".  I don't believe the latter properly reflects
the document; it's certainly not how I parse the document intuitively.

What the seaside parser does is records whether or not a given tag was
explicitly closed, and only outputs a close tag for it if it was.  This
ends up coming very close to outputting identical HTML to what it's given,
whitespace included.  Except in the case of a dangling close tag, I don't
think I've ever seen it output anything else.

Depending on what you're doing with the tree, having an incorrect
structure, and having some of the tags inlined, is fine.  Seaside includes
a full macro system for its templates, which in theory could perform
arbitrary transformations on the tree.  Having either an incomplete tree
or one which doesn't match the developer's intuitions about the document's
structure (isn't great when a syntax is so complex that you're never quite
certain how it'll be parsed?), restricts this power considerably.  Now,
this may be unnecessary power - I've only used the macro system for very
simple cases that would almost certainly work with the parser you describe
as well.  But I like having that power in reserve, and would be somewhat
loathe to give it up without a very good reason.  (Quick example - I had a
discussion with Marcel a while ago about templates that included no
special identifiers or marking whatsoever, with totally external
definitions of which elements were to be treated specially.  For that, the
parser simply can't make any assumptions about what structural
information is useful and what isn't).