XML Parser choice (was Re: [ENH] ??? MD5 in Squeak.)

Bijan Parsia bparsia at email.unc.edu
Thu Nov 29 01:37:17 UTC 2001


On Wed, 28 Nov 2001, Duane Maxwell wrote:

[snip]
> I agree that a wellformedness parser is relatively easy, which is why there
> are so many of them - when I wrote it, however, there weren't any for
> Squeak.  On the other hand, I think you can count on one hand all of the
> fully validating parsers in *any* language.  It's very tough to implement
> everything correctly, and generally unnecessary.

Validation is a shifting goal of course. There's DTD validation, XML
Schema (run away!), Relax-ng, etc.

>  If we were to wait until
> such a parser existed under an appropriate Squeak-compatible license in
> Smalltalk, we'd never have anything.  By putting something in now that has
> the potential of being extended, we at least open to the door to handling
> XML data even if we let stuff through that might not otherwise survive
> validation.

Er...but it seems these considerations support the arguments I've been
making. At least the VWXML stuff *attempts* DTD validation, the code
owners regard failures in this realm to be bugs, and are committed to
extending it (unto XML Schema validation!!!).

If we're going to go for the brass ring, I want to be standing ontop of a
tall horse. Or something. :)

I've not picked apart the problem Andreas had with the VWXML parser. It is
true that it niether had a terrific interface, nor documentation.

Plus, more than the Squeak community is working on it. Not just Cincom,
but other folks. And not just the VisualWorks community.

The VWXML parser is partial. So if being "something incomplete but
useful" is a measure of worthiness, it's worthy :)

The licence issue has to be investigated, yes. So too does the new code
base (i.e., VW 5i.4). I'm willing to spearhead an effort to port all that
which we can pry loose from Cincom (which would result, at least for now,
in a (somewhat) validating parser, a partial XSLT engine, some XPath
stuff, maybe SOAP support; these are the extant things, the things that
already exist).

But I'm willing to believe that it's premature to standardize on the
parser/node set. It's not premature to get unicode support, though :)

OTOH, I've mentioned that a very rational, super minimal core can rest
under quite a variety of superstructures, including those for validation,
etc. If such a core would sensibly replace the guts of the VWXML parser
(and a number of others) I'm for it. Or something :)

Cheers,
Bijan Parsia.





More information about the Squeak-dev mailing list