[squeak-dev] how to create an UTF-8 character

Yoshiki Ohshima yoshiki at vpri.org
Fri Sep 26 16:56:43 UTC 2008


At Wed, 24 Sep 2008 07:45:38 -0700,
Colin Putney wrote:
> 
> A UTF8String would be really handy for web applications, where strings  
> come in from the net as UTF-8, live in the image for a while, then get  
> sent out as UTF-8. O(1) random access isn't very useful, because  
> strings are (mostly) uninterpreted, but converting to Squeak's  
> internal representation is expensive.
> 
> The thing is, as long as the "sequence of characters" abstraction is  
> maintained, it doesn't matter (for purposes of correct behavior) what  
> the internal representation is. So it's perfectly reasonable to have  
> multiple encodings with different performance profiles. UTF8String  
> when you need it, WideString when that makes sense.

  The thing is though, that even from the net UTF-8 is not as dominant
as like that.  There are bunch of other encoding used.

  And, have UTF8String and WideString causes the comparison etc. more
complicated than it should.  Have a single internal representation is
cleaner.

  Have the encoded data in ByteArray is sensible thing to do.  That
would have been much bigger redesign of Squeak, though.

-- Yoshiki




More information about the Squeak-dev mailing list