[squeak-dev] how to create an UTF-8 character
Norbert Hartl
norbert at hartl.name
Tue Sep 23 14:07:20 UTC 2008
On Tue, 2008-09-23 at 06:48 -0700, Bert Freudenberg wrote:
> Am 23.09.2008 um 01:46 schrieb stephane ducasse:
>
> > Hi all
> >
> > I would like to know how I can create an UTF-* character composed
> > for example of two bytes
> >
> > 16rC3 and 16rBC
> >
> > I tried
> >
> > WideString fromByteArray: { 16rC3 . 16rBC }
> >
> > Stef
>
> There is no such thing as a "UTF-*" character. There are Unicode
> Characters, and Unicode Strings, and there are UTF-encoded string (UTF
> means Unicode Transformation Format).
>
> All characters in Squeak use Unicode now. For example, the cyrillic Б
> is
>
> char := Character value: 16r0411.
>
> this can be made into a String:
>
> wideString := String with: char.
>
> which of course has the same Unicode code points:
>
> wideString asArray collect: [:each | each hex]
>
> gives
>
> #('16r411')
>
> The string can be encoded as UTF-8:
>
> utf8String := wideString squeakToUtf8.
>
> and to see the values there
>
> utf8String asArray collect: [:each | each hex]
>
> yields
>
> #('16rD0' '16r91')
>
> which is the UTF-8 representation of the character we began with (but
> if you try to pront utf8String directly you get nonsense, because
> Squeak does not know it is UTF-8 encoded).
>
> The decoding of UTF-8 to a String is similar:
>
> #(16rC3 16rBC) asByteArray asString utf8ToSqueak
>
Hmmm, I knew it :) That is the same I did just readable and in one line
(and more of this "strange method stuff"[tm]).
> which returns the String 'ü' and probably is what you wanted in the
> first place - but please try to understand and use the Unicode terms
> correctly to minimize confusion.
>
> Anyway, to convert between a String in UTF-8 and a regular Squeak
> String, it's simplest to use utf8ToSqueak and squeakToUtf8.
>
> - Bert -
>
Norbert
P.S.: My only hope is that with my knowledge getting bigger and pharo's
getting smaller that we meet somewhere in between!!!
More information about the Squeak-dev
mailing list
|