[squeak-dev] XMLParser weirdness
Bert Freudenberg
bert at freudenbergs.de
Tue Aug 10 19:33:33 UTC 2010
On 10.08.2010, at 21:21, Andreas Raab wrote:
> Hi -
>
> I just spent about two hours staring at code because of an oddity in the XML parser's printing of nodes. Here's an example:
>
> node:= (XMLElement new) name: 'foo';
> addContent: (XMLStringNode string: 'Hello World');
> setAttributes: (Dictionary new);
> yourself.
>
> This prints '<foo>Hello World</foo>' which is fine. However, the following construction, which adds just a single attribute:
>
> node:= (XMLElement new) name: 'foo';
> addContent: (XMLStringNode string: 'Hello World');
> setAttributes: (Dictionary newFromPairs: {#id. 1});
> yourself.
>
> prints now as '<foo id="1"/>' (i.e., losing its content string). Looking at the code in XMLElement>>printXmlOn: it does something weird if the writer is considered "non-canonical", i.e.,
>
> "... snip ..."
> (writer canonical not
> and: [self isEmpty and: [self attributes isEmpty not]])
> ifTrue: [writer endEmptyTag: self name]
> "... snap ..."
>
> Two questions about this: 1) What's the meaning of 'canonical' XML? Is this a well-defined (sub-)set of XML? If so, where can I read about it? 2) Is the above a bug or a feature? I'm wondering in particular about XMLElement>>isEmpty which only considers the elements but not eventual contents.
>
> Any help is greatly welcome.
>
> Cheers,
> - Andreas
Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.
- Bert -
More information about the Squeak-dev
mailing list
|