[squeak-dev] how to create an UTF-8 character

Norbert Hartl norbert at hartl.name
Tue Sep 23 14:07:20 UTC 2008


On Tue, 2008-09-23 at 06:48 -0700, Bert Freudenberg wrote:
> Am 23.09.2008 um 01:46 schrieb stephane ducasse:
> 
> > Hi all
> >
> > I would like to know how I can create an UTF-* character composed  
> > for example of two bytes
> >
> > 16rC3 and 16rBC
> >
> > I tried
> >
> > 	WideString fromByteArray: { 16rC3 . 16rBC }
> >
> > Stef
> 
> There is no such thing as a "UTF-*" character. There are Unicode  
> Characters, and Unicode Strings, and there are UTF-encoded string (UTF  
> means Unicode Transformation Format).
> 
> All characters in Squeak use Unicode now. For example, the cyrillic Б  
> is
> 
> 	char := Character value: 16r0411.
> 
> this can be made into a String:
> 
> 	wideString := String with: char.
> 
> which of course has the same Unicode code points:
> 
> 	wideString asArray collect: [:each | each hex]
> 
> gives
> 
> 	 #('16r411')
> 
> The string can be encoded as UTF-8:
> 
> 	utf8String := wideString squeakToUtf8.
> 
> and to see the values there
> 
> 	utf8String asArray collect: [:each | each hex]
> 
> yields
> 
> 	 #('16rD0' '16r91')
> 
> which is the UTF-8 representation of the character we began with (but  
> if you try to pront utf8String directly you get nonsense, because  
> Squeak does not know it is UTF-8 encoded).
> 
> The decoding of UTF-8 to a String is similar:
> 
> 	#(16rC3 16rBC) asByteArray asString utf8ToSqueak
> 
Hmmm, I knew it :) That is the same I did just readable and in one line
(and more of this "strange method stuff"[tm]).

> which returns the String 'ü' and probably is what you wanted in the  
> first place - but please try to understand and use the Unicode terms  
> correctly to minimize confusion.
> 
> Anyway, to convert between a String in UTF-8 and a regular Squeak  
> String, it's simplest to use utf8ToSqueak and squeakToUtf8.
> 
> - Bert -
> 

Norbert

P.S.: My only hope is that with my knowledge getting bigger and pharo's
getting smaller that we meet somewhere in between!!!




More information about the Squeak-dev mailing list