XML Parser, interleaving text and elements
Julian Fitzell
julian at beta4.com
Tue Sep 9 16:37:28 UTC 2003
Lex Spoon wrote:
> Avi Bryant <avi at beta4.com> wrote:
>
>>On Thu, 4 Sep 2003 sstnjpm02 at sneakemail.com wrote:
>>
>>
>>>Thanks. I see that my example works properly but I hope I am not trading
>>>one set of problems for another. So far I found one problem which prevents any
>>>of my html from rendering:
>>>
>>><br/> prints as <br//>
>>><input href="y"/> prints as <input href="y"/> />
>>>
>>>and various other problems with >.... being added
>>
>>The problem seems to lie with Scamper's HTMLTokenizer class, which the
>>HTML-Parser package reuses.
>>
>>It looks like some hacking of #nextName and #nextTag would be in order.
>>If I get a chance I'll look at that later tonight.
>>
>
>
> Well, HTML doesn't have self-closing tags like this. Are you thinking
> of hacking the tokenizer to return *two* tags when it sees a
> self-closing tag? I suppose that would be a reasonable way to go, since
> the main goal of these classes is to render.
HTML has tags like <br> which are neither self-closing nor closed by
anything else. This is incredibly difficult to parse because you
actually have to understand the behaviour of each individual tag (thus
why the html parser I wrote has to basically encode the entire HTML spec
into code).
XHTML was come up with as a solution to this problem. It is backwards
compatible with existing browsers but parses as valid XML.
The XHTML spec says that you should have a space between the tag name
and the closing /, however, for backwards compatibility with HTML. I'm
not sure why I didn't notice this when this message first showed up, but
is there any chance it just works if you use:
<br />
<input href="y" /> (not that input tags have href attributes... not
sure where this came from :) )
> What's up with self-closing tags, anyway? XML throws away all the
> niceties of SGML... and then adds this? What a nuisance.
Well, it's just a short cut for an empty tag. You could remove it but
it isn't hard to parse and it is cleaner to look at (and xml is
/supposed/ to be human-readable :) ).
Julian
More information about the Squeak-dev
mailing list
|