Re: [Vm-dev] mantis http://bugs.squeak.org/view.php?id=7349

7 May 2009

      At Thu, 7 May 2009 11:37:10 -0700,
Eliot Miranda wrote:
...
Yes, among these choices, my vote would be for UTF-32 (for 21-bit
space). But variable-length-ness doesn't really go away when even
when using UTF-32, as there are composition characters.

Alternatively, we could go for all UTF-8 in image representation for
Strings (as a data buffer) and when you need a Character, create an
instance, or return the one in a table, that is in UTF-32. And in the
image side, displayable "String" should (almost) always accompany the
attributes like Text.

I'm a bit out of my depth here. I would have thought that you would want the basic string types to be fixed width for
fast accessing, simply because variable length doesn't scale to e.g. indexing 1 megabyte strings. But that for the
platform interface one would want efficient conversion to/from fixed and variable length encodings. But that's just my
gut. I expect I'll implement whatever y'all say makes sense.
Basically, I think UTF-32 is ok for the time being and requires very
little change to the code.
With the presence of composition characters, the situation where you
randomly access to an element and expect it to be a meaningful value
itself is rarer.
My proposition is that for a String (as data), we would rather avoid
random access anyway and always access it via a Stream.  Then, the
actual representation can be different.
-- Yoshiki

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] mantis http://bugs.squeak.org/view.php?id=7349