[squeak-dev] Unicode Support

tim Rowledge tim at rowledge.org
Sat Dec 5 04:49:21 UTC 2015


> On 04-12-2015, at 6:46 AM, Levente Uzonyi <leves at caesar.elte.hu> wrote:
> 
> Why would you want to have strings with UTF-8 or UTF-16 encoding in the image?
> What's wrong with the current UTF-32 representation?

WideStrings are perfectly ok most of the time, as are plain old byte String. Where things get a bit awkward is when interfacing to code that requires UTF8, such as Cairo/Pango and some OS interfaces.

Currently we can have simple byte String and edit in or append a wide character and all works properly; a WideString is made, everything gets sorted out. Well, everything I’ve had to try out for the Pi Scratch project. The problem is in having to convert too often; for example every rendering operation requires a conversion from Squeak format to UTF8. Some file reading operations require conversion from utf8 to squeak.

One idea I had but haven’t done anything with yet is to make a class that keeps both formats around to effectively cache the utf8. It isn’t needed for anywhere near all Strings. All editing/sorting would work on the squeak format part and after each edit the conversion would be done, or possibly the utf version flushed to cause a new conversion if it were ever requested.


tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful Latin Phrases:- Sentio aliquos togatos contra me conspirare = I think some people in togas are plotting against me.




More information about the Squeak-dev mailing list