UTF8 Squeak
Andreas Raab
andreas.raab at gmx.de
Wed Jun 13 02:32:17 UTC 2007
Colin Putney wrote:
> If a String were a flat array of Unicode code points, it would be
> implemented in Smalltalk as an array of Characters wouldn't it? The fact
> that you've chosen to hide the internal representation of the string and
> use a "variable byte" or "variable word" subclass to store bytes, rather
> than objects, is an indication that the strings *are* encoded. In fact,
> the encodings have names: ISO 8859-1 and UCS-4. Janko is proposing to
> add a string class that internally stores strings encoded in UCS-2 to
> the mix.
>
> So what's so holy about these particular encodings, besides the fact
> that they're especially efficient on the VisualWorks VM?
Indeed. That is effectively the point I was trying to make in taking a
more "encoding-centered" perspective on the problem. In which case there
is nothing holy about particular encodings (and nothing confusing about
the choice of names); some people use one encoding, some people use
another and by the end of the day there is no need to be religious about
what exactly a string must contain (EBCDIC anyone? :-)
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|