[squeak-dev] how to create an UTF-8 character
Yoshiki Ohshima
yoshiki at vpri.org
Fri Sep 26 16:56:43 UTC 2008
At Wed, 24 Sep 2008 07:45:38 -0700,
Colin Putney wrote:
>
> A UTF8String would be really handy for web applications, where strings
> come in from the net as UTF-8, live in the image for a while, then get
> sent out as UTF-8. O(1) random access isn't very useful, because
> strings are (mostly) uninterpreted, but converting to Squeak's
> internal representation is expensive.
>
> The thing is, as long as the "sequence of characters" abstraction is
> maintained, it doesn't matter (for purposes of correct behavior) what
> the internal representation is. So it's perfectly reasonable to have
> multiple encodings with different performance profiles. UTF8String
> when you need it, WideString when that makes sense.
The thing is though, that even from the net UTF-8 is not as dominant
as like that. There are bunch of other encoding used.
And, have UTF8String and WideString causes the comparison etc. more
complicated than it should. Have a single internal representation is
cleaner.
Have the encoded data in ByteArray is sensible thing to do. That
would have been much bigger redesign of Squeak, though.
-- Yoshiki
More information about the Squeak-dev
mailing list
|