2008/9/23 Bert Freudenberg bert@freudenbergs.de:
Am 23.09.2008 um 01:46 schrieb stephane ducasse:
Hi all
I would like to know how I can create an UTF-* character composed for example of two bytes
16rC3 and 16rBC
I tried
WideString fromByteArray: { 16rC3 . 16rBC }
Stef
There is no such thing as a "UTF-*" character. There are Unicode Characters, and Unicode Strings, and there are UTF-encoded string (UTF means Unicode Transformation Format).
All characters in Squeak use Unicode now. For example, the cyrillic Б is
char := Character value: 16r0411.
this can be made into a String:
wideString := String with: char.
which of course has the same Unicode code points:
wideString asArray collect: [:each | each hex]
gives
#('16r411')
The string can be encoded as UTF-8:
utf8String := wideString squeakToUtf8.
and to see the values there
utf8String asArray collect: [:each | each hex]
yields
#('16rD0' '16r91')
which is the UTF-8 representation of the character we began with (but if you try to pront utf8String directly you get nonsense, because Squeak does not know it is UTF-8 encoded).
The decoding of UTF-8 to a String is similar:
#(16rC3 16rBC) asByteArray asString utf8ToSqueak
which returns the String 'ü' and probably is what you wanted in the first place - but please try to understand and use the Unicode terms correctly to minimize confusion.
Anyway, to convert between a String in UTF-8 and a regular Squeak String, it's simplest to use utf8ToSqueak and squeakToUtf8.
Am I the only one using the generic en/decoding functionality in Squeak in the form of #convertTo/FromEncoding?
Convert from "Squeak" to UTF-8 aString convertToEncoding: 'utf-8'
Convert from UTF-8 to "Squeak" aString converFromEncoding: 'utf-8'
For checking out all the encodings your image supports: TextConverter allEncodingNames
Cheers Philippe