Unicode support

Bert Freudenberg bert at isgnw.CS.Uni-Magdeburg.De
Tue Sep 14 18:40:50 UTC 1999


On Tue, 14 Sep 1999, Todd Blanchard wrote:

> > On Mon, 13 Sep 1999, Todd Blanchard wrote:
> >
> > > I'm wanting to implement some  unicode support.  Who can tell me -   
> > > how big is a word?
> > > Is it two bytes?
> >
> > No, it's four bytes. There is no two-byte primitive supported array in 
> > Squeak (yet).
> 
> So whats it going to take to get one? Is this something that could  
> be put together by an experienced C programmer with some high-level  
> Smalltalk experience by cloning the variableByteArray class and  
> adjusting the data sizes?

Currently there are only 1-byte arrays (ByteArray) and 4-byte arrays
(object pointers and words). You would have to find all places that
accesses the class format and change them to recognize the new 2-byte
format. These are a lot. Look, for example, into
Interpreter>>primitiveStringReplace which you certainly would want to use
for fast Unicode string manipulations. 

But basically you could just start using the byte-wise stuff and adjusting
all sizes by a factor of 2. In #at: you would construct a Unicode
character from 2 bytes etc. I'd think this would be not even that slow,
and you could still switch to primitives later.

> Can you point me to info on low-level data formats in squeak?

No ... except for that's all in the image ;-)

I'll copy this back to the list, maybe someone else knows better.

  /bert





More information about the Squeak-dev mailing list