<div dir="ltr">Hi,<div>thanks for the info.</div><div>I guess I need a <span style="font-size:16px">HTMLTokenizer</span><span style="font-size:16px"> for what I'm doing. I had issues with &nbsp as well, with the current XMLTokenizer</span></div><div><span style="font-size:16px"><br></span></div><div><span style="font-size:16px">Karl</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 1, 2015 at 11:01 PM, Jakob Reschke <span dir="ltr"><<a href="mailto:jakob.reschke@student.hpi.de" target="_blank">jakob.reschke@student.hpi.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I guess this will not help you, but a standalone ampersand is not<br>
valid XML (it is the leader for entities, if you want to have a<br>
literal ampersand in the text, the markup must be &amp;), hence I<br>
would not expect any XML tokenizer or parser implementation to accept<br>
it.<br>
<br>
HTML is more relaxed about this, so a standalone amapersand is valid,<br>
but you would need some kind of HTMLTokenizer and I do not know<br>
whether there is such thing for Squeak. Anyone else knows one?<br>
<br>
Best regards<br>
<span class="HOEnZb"><font color="#888888">Jakob<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
2015-06-01 20:05 GMT+02:00 karl ramberg <<a href="mailto:karlramberg@gmail.com">karlramberg@gmail.com</a>>:<br>
> Hi,<br>
> I'm parsing some html docs but the XMLTokenizer chockes on a '&' followed by<br>
> a space in a string.<br>
> I guess '&' is used for other stuff than a 'and' in html and it causes error<br>
> when used in plain text.<br>
><br>
> Does anybody have fix for this?<br>
><br>
> Karl<br>
<br>
</div></div></blockquote></div><br></div>