[ENH]Html table (second version)

Richard A. O'Keefe ok at atlas.otago.ac.nz
Mon Aug 27 23:46:29 UTC 2001


	It seems to me that it might be best if there were development
	of a DOM package for Squeak.  Then Scamper and other tag
	language processors, e.g., a JSP page container using Squeak as
	the scripting language, could use the DOM package.  The issue of
	a DOM package has been brought up before on the list.  A Squeak
	flavored re-implmentation of the ideas in JDOM
	(http://www.jdom.org/) might be one way to go.
	
It's time to warn against the DOM again I see.
In the context of (X)HTML, "*the* DOM" means a particular design by
the W3C.  It is a very bad design.  It is extremely memory-hungry (at best
a measured factor 2.5 worse than a simpler model) and if it had been
designed with the aim of making transformations difficult to express they
couldn't have done a better job.  (I've tried.)

In support of my attack, I point to the existence of the JDOM.  The JDOM
is not without defects of its own.  In particular, the decision that
"lists" should be "live" strikes me as making implementation harder in
order to make use harder.

I do not think that a Squeak design should be influenced by *anything*
other than
 - the nature of the task
 - Squeak itself
 - "experience guided by intelligence".

	John Hinsley's table example is very ill-formed.  Yes, there is
	ill-formed HTML in the real world, but I suggest that you focus
	first on getting Scamper fully functional with properly
	formatted HTML containing balanced tags.

Why the insistence on balanced tags?  There's a lot of well written HTML
out there (with the W3C's validation stamps on it yet) that doesn't use
space-bloating balanced tags.

For each element type, note whether it allows text or not, and which
other element types it allows.  Whenever you encounter text or a tag, if
it is not allowed by the current element, close and pop elements that
allow their end-tags to be omitted until you find a element that does
the new item, or an element that doesn't allow its end-tag to be omitted.
That won't let you reconstruct omitted start tags (such as <HTML>,
<HEAD>, and <BODY>, but it _will_ let you reconstruct a well-bracketed
tree, so you can deal with valid HTML.





More information about the Squeak-dev mailing list