[squeak-dev] Re: [Pharo-dev] Unicode Support

EuanM euanmee at gmail.com
Fri Dec 11 20:19:13 UTC 2015


"If it hasn't already been said, please do not conflate Unicode and
UTF-8. I think that would be a recipe for
a high P.I.T.A. factor."  --Richard Sargent

I agree. :-)

Regarding UTF-16, I just want to be able to export to, and receive
from, Windows (and any other platforms using UTF-16 as their native
character representation).

Windows will always be able to accept UTF-16.  All Windows apps *might
well* export UTF-16.  There may be other platforms which use UTF-16 as
their native format.  I'd just like to be able to cope with those
situations.  Nothing more.

All this is requires is a Utf16String class that has an asUtf8String
method (and any other required conversion methods).   And other string
classes to have asUtf16String classes.  Once we have the other classes
and methods, this should be a trivial extensions.  Export will just be
transformations of existing formats of valid strings.  Import just
needs to transform to (one of) our preferred format(s), and have a
validity check performed after the transform is complete.


On 11 December 2015 at 15:37, Richard Sargent
<richard.sargent at gemtalksystems.com> wrote:
> EuanM wrote
>> ...
>>         all ISO-8859-1 maps 1:1 to Unicode UTF-8
>> ...
>
> I am late coming in to this conversation. If it hasn't already been said,
> please do not conflate Unicode and UTF-8. I think that would be a recipe for
> a high P.I.T.A. factor.
>
> Unicode defines the meaning of the code points.
> UTF-8 (and -16) define an interchange mechanism.
>
> In other words, when you write the code points to an external medium
> (socket, file, whatever), encode them via UTF-whatever. Read UTF-whatever
> from an external medium and re-instantiate the code points.
> (Personally, I see no use for UTF-16 as an interchange mechanism. Others may
> have justification for it. I don't.)
>
> Having characters be a consistent size in their object representation makes
> everything easier. #at:, #indexOf:, #includes: ... no one wants to be
> scanning through bytes representing variable sized characters.
>
> Model Unicode strings using classes such as e.g. Unicode7, Unicode16, and
> Unicode32, with automatic coercion to the larger character width.
>
>
>
>
> --
> View this message in context: http://forum.world.st/Unicode-Support-tp4865139p4866610.html
> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>


More information about the Squeak-dev mailing list