[squeak-dev] Unicode Support
tim Rowledge
tim at rowledge.org
Sat Dec 5 04:49:21 UTC 2015
> On 04-12-2015, at 6:46 AM, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>
> Why would you want to have strings with UTF-8 or UTF-16 encoding in the image?
> What's wrong with the current UTF-32 representation?
WideStrings are perfectly ok most of the time, as are plain old byte String. Where things get a bit awkward is when interfacing to code that requires UTF8, such as Cairo/Pango and some OS interfaces.
Currently we can have simple byte String and edit in or append a wide character and all works properly; a WideString is made, everything gets sorted out. Well, everything I’ve had to try out for the Pi Scratch project. The problem is in having to convert too often; for example every rendering operation requires a conversion from Squeak format to UTF8. Some file reading operations require conversion from utf8 to squeak.
One idea I had but haven’t done anything with yet is to make a class that keeps both formats around to effectively cache the utf8. It isn’t needed for anywhere near all Strings. All editing/sorting would work on the squeak format part and after each edit the conversion would be done, or possibly the utf version flushed to cause a new conversion if it were ever requested.
tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful Latin Phrases:- Sentio aliquos togatos contra me conspirare = I think some people in togas are plotting against me.
More information about the Squeak-dev
mailing list
|