[Bug][Fix] Unicode characters in HtmlParser

Bert Freudenberg bert at isgnw.CS.Uni-Magdeburg.De
Thu Mar 2 16:32:20 UTC 2000

On Thu, 2 Mar 2000, Mark Guzdial wrote:

> My Sophomore Squeak-learners are developing personal newspapers drawn 
> from other news web sites, and they found that ESPN's site has 
> Unicode characters in it (via ampersand stuff) that breaks the 
> HtmlParser and Scamper. 

Well, there are a lot of specialEntities like umlauts (ä) etc. that
are not currently handled correctly. Also, iso8859-1 to Squeak charset
conversion is not done. I posted a changeset a while ago but it didn't
make it into the image - it's the third attachment in


More information about the Squeak-dev mailing list