UTF8 Squeak

Alan Lovejoy squeak-dev.sourcery at forum-mail.net
Fri Jun 8 03:24:47 UTC 2007


<Alan L>UTF-8 should be the default</Alan L>

<J J (Jason)>Wouldn't that be a pretty big speed impact given how much
strings are used?</J J (Jason)>

Now that I think about it, that could very well be the case.  There might be
clever ways to make the impact much less than one might otherwise expect
(for example, RunArrays were a clever way to make Text objects reasonably
efficient)--but I haven't actually implmented it, so there's no guarantee.

So, perhaps the default internal String encoding should be UTF-32, instead
of UTF-8 or UTF-16, in order to avoid the performance issue.  But that
raises a memory usage issue--which is the primary reason I don't think a
"one size fits all" approach is sufficient.

--Alan






More information about the Squeak-dev mailing list