[squeak-dev] Recent change in byte array at:put:

Sun Jul 30 01:20:22 UTC 2017

> On 29-07-2017, at 12:48 PM, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
> Absolutely,
> to me a String is a sequence of characters.
> squeakToUtf8 is a hack that makes us consider a String as a sequence of codePoints whose encoding is in the eye of the beholder (or implicitly in the Context - the Smalltalk one).
> I's not very object oriented and quite fragile.
> We started to clean Multilingual but never finished the job…

Yes, that’s pretty much how I see it. Currently the utf8 ‘string’ is just kept as a byte string and the user is expected to understand that it is in a rather dangerous state.

> 
> It's difficult to finish it, because we value backward compatibility.
> So maybe the ByteArray change was a bit radical with this respect.

Backward compatibility can sometimes drive you to loud swearing!

Maybe a new message to return the bytearray of the uft8 data could be added, leaving the old one alone. We should probably consider making an actual UTF8String class, though I did try to work out the best thing to do for that several years ago for NuScratch and got lost in the tangles. Editing the damn things is a pain, to say the leat, so you get to thinking about having the canonical string as an instvar and a byte array and edits work on the String which gets converted at the end of the edit to update bytearray. Or the other way around… or… aaargh!

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful random insult:- Immune from any serious head injury.