[ENH]Html table (second version)

Randal L. Schwartz merlyn at stonehenge.com
Tue Aug 28 10:45:17 UTC 2001


>>>>> "Richard" == Richard A O'Keefe <ok at atlas.otago.ac.nz> writes:

Richard> 	John Hinsley's table example is very ill-formed.  Yes, there is
Richard> 	ill-formed HTML in the real world, but I suggest that you focus
Richard> 	first on getting Scamper fully functional with properly
Richard> 	formatted HTML containing balanced tags.

Richard> Why the insistence on balanced tags?  There's a lot of well
Richard> written HTML out there (with the W3C's validation stamps on
Richard> it yet) that doesn't use space-bloating balanced tags.

Richard> For each element type, note whether it allows text or not,
Richard> and which other element types it allows.  Whenever you
Richard> encounter text or a tag, if it is not allowed by the current
Richard> element, close and pop elements that allow their end-tags to
Richard> be omitted until you find a element that does the new item,
Richard> or an element that doesn't allow its end-tag to be omitted.
Richard> That won't let you reconstruct omitted start tags (such as
Richard> <HTML>, <HEAD>, and <BODY>, but it _will_ let you reconstruct
Richard> a well-bracketed tree, so you can deal with valid HTML.

I support this approach.  The HTML DTD gives a clear list of what
ending tags can be omitted, and what elements can be within other
elements.  A proper browser implements these rules properly so that
well-formed HTML (which may omit closing tags) can be parsed.

Note that the "closing tags may be omitted" mess of HTML makes HTML
harder to parse, although not impossible to parse.  That's why XML got
rid of this, forcing all closing tags to be present.

I don't think you need to put *error*-correcting into any early
release of Scamper.  But knowing about omitted end tags is not
error-correcting, it's parsing legal HTML!

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn at stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!




More information about the Squeak-dev mailing list