<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Dec 5, 2015 at 8:41 PM, Levente Uzonyi <span dir="ltr">&lt;<a href="mailto:leves@caesar.elte.hu" target="_blank">leves@caesar.elte.hu</a>&gt;</span> wrote:<br><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""></span>

We do the same thing, but that doesn&#39;t mean it&#39;s a good idea to create a new String-like class having its content encoded in UTF-8, because UTF-8-encoded strings can&#39;t be modified like regular strings. While it would be possible to implement all operations, such implementation would become the next SortedCollection (bad performance due to misuse).<br></blockquote><div><br></div><div>Well, UTF-8 strings would have different performance tradeoffs than our existing string classes. Random-access would be expensive, in-place modification would be sometimes expensive, memory usage for non-English strings would be lower, encoding/decoding for IO would be eliminated. I find that&#39;s a good fit to some of my uses of strings, and don&#39;t mind thinking about the tradeoffs. YMMV.</div><div><br></div><div>One I idea I&#39;ve wondered about in the past is having classes instead of language tags. EnglishString, RomainianString etc, with encodings that make sense for the language. That would do a lot for m17n, without going for the full complexity of Unicode. It could also co-exist well with Utf8String, Utf16String etc, since those coudl be considered pseudo-languages/encodings. The downside would be that multi-lingual strings would be more difficult - you&#39;d need ropes or the like.</div><div><br></div><div>Colin</div><div><br></div><div><br></div><div><br></div></div></div></div>