[ANN] BhrXmlParser ported to Squeak

Helge Horch Helge.Horch at munich.netsurf.de
Fri May 19 02:56:31 UTC 2000


Dave, Folks,

As promised, I have completed the port of the Burry Holms Research XML 
Parser, written by Dave Harris, and it's now available at

http://home.munich.netsurf.de/Helge.Horch/SqueakSharesSoar.html#XML

It may not be the perfect "I grok your stoopid DTD" parser, but I'm happy 
with it. It's small, very well written (all kudos go to Dave -- and it 
wasn't as unportable as you thought!), and does all I need at the moment.

(Minnow seems down at the moment, or else I'd have updated a page or two.)

I can't post an example right now, but have a look at the code and the unit 
tests, the usage patterns are simple.  XmlBuilder is a good start, 
especially for using the SAX-inspired stuff.  Have a look at XmlBuilderTest 
and its #assertParse:yields: and #events methods.  Then follow the leads to 
XmlBuilderTestRoot (a node class) that understands #onBody: etc.

To quote from the aforementioned page:

I have ported Dave's lightweight (partial and nonvalidating) XML Parser 
from Dolphin to Squeak. I found its approach (3 classes + 1 exception 
class) to be very appealing and the code to be very transparent.

Dave originally published the package for Dolphin 3.06 under the LGPL (just 
before Camp Smalltalk 2000). He has remarked in private communication that 
he didn't really want to encourage its widespread use because the Camp 
Smalltalk crew was working on a more complete implementation. Alas, it was 
useful and lightweight enough for me to carry out the port, and since it's 
LGPLed, I think I'm supposed to share. ;-)

This (21KB) is the zipped distribution. It requires Squeak 2.8a with at 
least update 2126 (Stefan Matthias Aust's Collection and assert: changes). 
You'll need to file in the contained change sets in this order:

1.) EOS.6.cs -- enhances Squeak's ReadStream>>next to signal an Exception 
at the stream end (Thanks to Bob Arning for guidance!)

2.) XmlParser.1.cs -- the XmlParser classes (all four of them)

3.) (optional, you need SUnit) XmlParserTests.1.cs -- a bunch of Unit Tests 
for the XmlParser classes

Here are some quotes from the README (included):
[---]
This is a partial XML parser with a SAX-like event-driven interface. It 
does not validate or handle entities (other than the standard ones) or 
Unicode; it is probably not very fast. However, it is relatively small and 
useful for setting up fixtures for unit tests etc.

[...] When I started, I couldn't find a reasonable standard lightweight 
parser for Dolphin. That will probably change with Camp Smalltalk (14th 
March 2000) and this code will probably not be supported after that date. [...]

I have included a class called XmlBuilder which wraps the SAX interface 
with something a bit more Smalltalk-friendly. You provide it with a 
dictionary which maps XML element names onto message selectors. The Builder 
keeps a stack of element-objects. When it sees a #startElement:attributes: 
event, it looks up the name, sends the corresponding message to the 
top-of-stack, and pushes its result onto the stack so that it will receive 
subsequent events. On #endElement the stack is popped. XML text is 
forwarded to the top-of-stack and other SAX messages are quietly ignored.

The idea is to have domain data structures that build themselves. Objects 
correspond to elements and know how to deal with the elements they contain: 
typically either by creating a new object for the contents and returning 
it; by returning self and dealing with its contents themselves; or 
returning a DeafObject to ignore part of the tree. For XML output you would 
have the objects write themselves with messages like #printXmlOn:.

This approach mixes XML knowledge into the domain objects. The alternative 
is to use a SAX-like "application" to build the objects from the outside, 
in which case you should also use the Visitor pattern to render them as XML 
from the outside. Sometimes it's simpler to just accept XML as a core 
format and read/write it directly, with the minimum of extra support 
classes. I find the Builder's mix of context (in the form of the element 
stack) and the element->message dictionary provides a good mix of 
assistance and flexibility.
[---]





More information about the Squeak-dev mailing list