[squeak-dev] String >> utf8Encoded, ByteArray >> utf8Decoded

tim Rowledge tim at rowledge.org
Sun Jan 28 19:13:31 UTC 2018



> On 28-01-2018, at 10:40 AM, Tobias Pape <Das.Linux at gmx.de> wrote:
> 
>> On 28.01.2018, at 19:06, tim Rowledge <tim at rowledge.org> wrote:
>> 
>> What would be so much better is a proper UTF8String class.
>> 
>> One of the problems of course is that doing almost anything to a utf encoded pseudostring requires complex faffing around to decode some or all of it. This makes them pretty much useless for anything outside passing to external libraries, at least so far as I have found. However, that turns out to be a quite important thing, and right now we have a horrible mess.
> 
> I think our ByteString/WideString is already pretty good

I agree; the mess I was pointing at is the  non-support for knowing that you have a UTF-* encoded string in the bytearray you just got passed. I’ve been bitten by that quite a few times in the NuScratch and MQTT packages for example.

Having a class that tells us the content is a string rendered in utf-8 encoded bytes would be a useful thing, not least because it would make a nice simple way to know that #asString requires converting it that way. Leaving everything as just a ByteArray means we know too little about it to be helpful. Maybe rather than calling such a class ‘UTF8String’, which implies the whole String-ness thing, we should have a UTF8EncodedBytes class to be really clear. One issue with doing anything like this is having to make the VM return the new class instead of a ByteArray - or perhaps make the prim related code use #adoptInstance: a bit like some of the FileStream and SocketStream methods do.

I like your idea of a really well done string system that does all that; I don’t like the amount of work it feels like would be needed. I certainly can’t imagine having time to do it myself. Pretty sure I have less than 200 years to go...

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Oxymorons: "Now, then ..."




More information about the Squeak-dev mailing list