[squeak-dev] String >> utf8Encoded, ByteArray >> utf8Decoded

tim Rowledge tim at rowledge.org
Sun Jan 28 18:06:10 UTC 2018


What would be so much better is a proper UTF8String class.

One of the problems of course is that doing almost anything to a utf encoded pseudostring requires complex faffing around to decode some or all of it. This makes them pretty much useless for anything outside passing to external libraries, at least so far as I have found. However, that turns out to be a quite important thing, and right now we have a horrible mess.

We do, however, have a sorta-kinda model of a way to handle it in FilePath, which actually has nothing much to do with file paths at all. FilePath is a bit more general than a UTF-* string and maybe that is still a valuable option.

I see two basic options for making an improvements

a) a simplistic UTF8String that is a byte array of the utf8 bytes, does nothing much except exist as an encoded string to pass to primitives. #size returns the number of bytes, #at & #at:put: are not for general consumption, to do any sort of editing you have to covert it to a real String.

b) something like FilePath, with both the original and encoded version kept, automagic conversions and some interesting hand-waving to deal with #size (is it the number of characters, or the number of bytes?) etc.

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful random insult:- His future is behind schedule.




More information about the Squeak-dev mailing list