[squeak-dev] XMLParser weirdness

Tue Aug 10 19:33:33 UTC 2010

On 10.08.2010, at 21:21, Andreas Raab wrote:

> Hi -
> 
> I just spent about two hours staring at code because of an oddity in the XML parser's printing of nodes. Here's an example:
> 
> node:= (XMLElement new) name: 'foo';
> 	addContent: (XMLStringNode string: 'Hello World');
> 	setAttributes: (Dictionary new);
> 	yourself.
> 
> This prints '<foo>Hello World</foo>' which is fine. However, the following construction, which adds just a single attribute:
> 
> node:= (XMLElement new) name: 'foo';
> 	addContent: (XMLStringNode string: 'Hello World');
> 	setAttributes: (Dictionary newFromPairs: {#id. 1});
> 	yourself.
> 
> prints now as '<foo id="1"/>' (i.e., losing its content string). Looking at the code in XMLElement>>printXmlOn: it does something weird if the writer is considered "non-canonical", i.e.,
> 
> 	"... snip ..."
> 	(writer canonical not
> 		and: [self isEmpty and: [self attributes isEmpty not]])
> 		ifTrue: [writer endEmptyTag: self name]
> 	"... snap ..."
> 
> Two questions about this: 1) What's the meaning of 'canonical' XML? Is this a well-defined (sub-)set of XML? If so, where can I read about it? 2) Is the above a bug or a feature? I'm wondering in particular about XMLElement>>isEmpty which only considers the elements but not eventual contents.
> 
> Any help is greatly welcome.
> 
> Cheers,
>  - Andreas

Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.

- Bert -