XML Parser choice (was Re: [ENH] ??? MD5 in Squeak.)

Bijan Parsia bparsia at email.unc.edu
Tue Nov 27 17:05:26 UTC 2001


On Tue, 27 Nov 2001, Andreas Raab wrote:
[snip]
> > >From an outsider's perspective, this seems like a really strange
> > strategy.  Code size isn't a terribly big deal -- the thing the Squeak
> > community is most constrained by is programmer time.
> 
> Code size isn't but complexity is. Usually these two go hand in hand and
> therefore it's no strange argument at all. In fact I'd argue that programmer
> time is (in this particular case) mostly dictated by the complexity involved
> in the parser itself - most people will want to do pretty simple stuff.

Well there's complexity and there's complexity. And there's various
interfaces to manage that complexity. And dealing with missing
functionality can be more complex than dealing with unneeded
functionality.

It makes sense, in general, to have a SAX layer with a useful interface
for generating application objects. One kind of application object is a
DOM like (in the sense of representing most of the Infoset) tree. As long
as you support all the infoset features, it shouldn't be that difficult to
support whatever interface the application programmer wants to see. In
other words, the parser isn't as interesting, generally speaking, as the
output *except* that you might want the parser to take care of a bunch of
standard tasks (validation against DTDs or Schemas is just one
example) *or* you need certain programming or performance characterisitcs 
(e.g., the jabber needs mentioned earlier).

So, what do you put in the base image? What *are* we standardizing? 

One reason to work with VWXML parse nodes, even given all their ugliness,
is that you can easily port your application to VisualWorks or any
Smalltalk that supports a parser that generates those nodes. And vice
versa.

This seems like *some* sort of win to me :)

OTOH, what would *really* be nice is Unicode support. Why don't we get
that first, and then argue about the other layers? ;)

Hmm. Looking at VW5i.4, most of the node names look reasonable. There are
a few with underscores still, but I bet I can get Steve to change them....

OYTOH, I don't see any problem having multiple XML parsers/node sets,
etc. Picking one to bless with bundling is a purely political matter at
this point: What do we want to "force" folks to use (at least by
default). Anything that gets pulled in will be *very* hard to avoid if
you're doing XML stuff.

This goes even if it's modularized. The psycho-social impact remains the
same.

The VisualWorks parser/node set supports, overall, a larger community and
isn't technically horrible. (To be precise, it's rather featureful, though
not complete. It seems to be reasonably nippy. It's flexible. It's under
active development. The variety of interfaces seem sane if not wholely
exciting or beautiful.)

Cheers,
Bijan Parsia.





More information about the Squeak-dev mailing list