At Wed, 24 Sep 2008 07:45:38 -0700, Colin Putney wrote:
A UTF8String would be really handy for web applications, where strings come in from the net as UTF-8, live in the image for a while, then get sent out as UTF-8. O(1) random access isn't very useful, because strings are (mostly) uninterpreted, but converting to Squeak's internal representation is expensive.
The thing is, as long as the "sequence of characters" abstraction is maintained, it doesn't matter (for purposes of correct behavior) what the internal representation is. So it's perfectly reasonable to have multiple encodings with different performance profiles. UTF8String when you need it, WideString when that makes sense.
The thing is though, that even from the net UTF-8 is not as dominant as like that. There are bunch of other encoding used.
And, have UTF8String and WideString causes the comparison etc. more complicated than it should. Have a single internal representation is cleaner.
Have the encoded data in ByteArray is sensible thing to do. That would have been much bigger redesign of Squeak, though.
-- Yoshiki