[Seaside] [BUG] in IAHtmlParser

Avi Bryant avi@beta4.com
Fri, 29 Mar 2002 13:05:32 -0800 (PST)


On Fri, 29 Mar 2002, Damon Anderson wrote:

> It was definitely designed with a different philosophy in mind. My goal
> was to try not to assume anything. If the user wants to manipulate a
> paragraph block, then they need to put in a closing </p>.

How many designers bother putting in the </p>?  Everybody knows where a
paragraph gets closed, I don't really see why the parser should have to be
babied by giving it extra information.

> The reason why
> I decided to leave dangling tags in the document is so that it would be
> possible to handle cases like "<b><i>foo</b></i>". There's obviously no
> tree structure which could possibly represent that without munging, so I
> punt.

Yes, that makes sense, although I'm skeptical that there are cases where
"<b><i>foo</i></b>", which is how the Seaside parser would treat that,
isn't as good or better.  There may be pathological cases of broken
browsers, of course - do you have any examples of these?

> I agree, that's a very useful thing to be able to do. And I'm not trying
> to persuade anybody to switch to my parser, but consider: if the HTML
> document is valid, you have that capability with my parsing philosophy
> as well

No, you don't, and that's the point: "<ul><li>1<li>2</ul>" is a *valid*
HTML document, and it is crucial for manipulating it properly that the 1
and 2 be children of the list items, not siblings.  The only way to
correctly parse valid HTML documents is to follow the HTML specification.
Believe me, we tried to come up with simpler heuristics, but it's not
worth it.  Having to "play catch up" with the spec is, IMO, a reasonable
price to pay, particularly since the next transition will presumably be to
XHTML, and make this all moot.

> I've thought about that as well. Unless you have sophisticated pattern
> matching (HTML "shape" detection, basically), doesn't that tie the
> external definition to the formatting of your HTML?

I've long wanted such pattern matching, it would come in handy all
kinds of places.  Think about automating interactions with possibly
changing web applications - unit testing, for example, would become much
more feasible.  I'd love to do some work on this, if I have time.  If
anyone has pointers to useful papers...