[squeak-dev] Unicode Support

Colin Putney colin at wiresong.com
Sun Dec 6 05:19:36 UTC 2015


On Sat, Dec 5, 2015 at 8:41 PM, Levente Uzonyi <leves at caesar.elte.hu> wrote:


> We do the same thing, but that doesn't mean it's a good idea to create a
> new String-like class having its content encoded in UTF-8, because
> UTF-8-encoded strings can't be modified like regular strings. While it
> would be possible to implement all operations, such implementation would
> become the next SortedCollection (bad performance due to misuse).
>

Well, UTF-8 strings would have different performance tradeoffs than our
existing string classes. Random-access would be expensive, in-place
modification would be sometimes expensive, memory usage for non-English
strings would be lower, encoding/decoding for IO would be eliminated. I
find that's a good fit to some of my uses of strings, and don't mind
thinking about the tradeoffs. YMMV.

One I idea I've wondered about in the past is having classes instead of
language tags. EnglishString, RomainianString etc, with encodings that make
sense for the language. That would do a lot for m17n, without going for the
full complexity of Unicode. It could also co-exist well with Utf8String,
Utf16String etc, since those coudl be considered
pseudo-languages/encodings. The downside would be that multi-lingual
strings would be more difficult - you'd need ropes or the like.

Colin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20151205/bc583b59/attachment.htm


More information about the Squeak-dev mailing list