[ENH]Html table (second version)

Bijan Parsia bparsia at email.unc.edu
Sun Aug 26 16:09:37 UTC 2001


--On Sunday, August 26, 2001 3:40 PM +0100 John Hinsley 
<jhinsley at telinco.co.uk> wrote:
[snip]
> That's a huge leap forward.
>
> I'm begining to see just how very difficult this is! Any ideas about
> this one? Weblint shows it as perfect, but Scamper chokes (even with
> your new .cs).
>
[snip HTML]
> (amazing what old stuff you find lying about on your hard drive!)
[snip]

In general, for this sort of thing, I recommend using HTML-Tidy to preclean 
the input into XHTML. XHTML is *way* more regular. Also, HTML-Tidy does a 
pretty good job of handling random HTML pretty much the way the major 
broswer(s? is there is a second major browser? :)) does.

I've been poking at HTML-Tidy with a thought of a Squeak port (there's a 
Java port), but haven't had any time at all.

Note that Georg Heeg's public wiki (which I found once and never again :)) 
has a T-Gen based HTML parser which prefers HTML-Tidy sanitized input. One 
nice thing is that they have a full array of classes for representing HTML 
4.0. It's for VisualWorks, but it can't be that hard to move over.

Cheers,
Bijan Parsia.




More information about the Squeak-dev mailing list