[squeak-dev] Recent change in byte array at:put:

tim Rowledge tim at rowledge.org
Sun Jul 30 17:19:03 UTC 2017


> On 30-07-2017, at 6:34 AM, Tobias Pape <Das.Linux at gmx.de> wrote:
> 
> 
>> On 30.07.2017, at 11:02, Jakob Reschke <jakob.reschke at student.hpi.de> wrote:
>> 
>> If licensing permits it one could also have a look at how the OpenJDK deals with UTF-16 in StringBuilder.
>> 
> 
> While I agree in principle, don't come near me with utf16 ;)

I was about to say something similar :-)

I think it’s reasonably clear that nobody wants to have UFT-X as the main representation of text within their system if any sort of editing might be involved. It’s just too painful. However, there seem to be quite a lot of places where UTF-8 has been chosen as a sort of interface coding, I imagine for some sort of space-saving reasons in general. It does seem like a bit of an early -90’s “oh my gosh, all the furrin letters take up so much space what can we do, we can’t ask people to install an entire megabyte of memory on their PCs!” thing.

For the NuScratch stuff I used Cairo/Pango to render text nicely and thus had to convert everything to UTF-8 in order to pass it to the renderer. No editing was done to any of that, so no backward conversions or complex parsing required. To my surprise the general performance on the Pi’s was not noticeably impacted; when I did my first experiments I though I would have to render the full fonts out to make my own glyph bitmaps and so on but in fact it worked nicely. Which meant that the languages with complex layout and kerning rules could be dealt with by somebody else’s code, which I like.

Jakob mentioned pairing encoded bytes with convertors of some kind and that made me think of Text, where we pretty much do that already. I wonder if using a runarray paired with the bytearray of UTF-8 (or even, dog help us, UTF-16) to call out where non-byte characters lurk would work? Think about behaving as if the text attribute were ‘this one needs 3 bytes’ rather than ‘this one is in flashing red sparkles with rotating underlines and winking quotes”. Given that we are able to handle editing Text pretty well, maybe, just maybe, that would make editing UFT-X work decently? Sounds like a good student project to me ;-)

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful random insult:- Calls people to ask them their phone number.




More information about the Squeak-dev mailing list