Unicode support

Duane Maxwell dmaxwell at entrypoint.com
Tue Sep 14 19:24:38 UTC 1999


I would suggest instead looking to implement one of the useful
transformations of Unicode, such as UTF-8.  It's a variable-length encoding
which could still use the current ByteArray character string
representation, still be able to encode the entire Unicode space if
necessary, as well as be efficient for the extremely common 7-bit ASCII
case.  The Unicode specification describes various algorithms for
conversion and manipulation of the various transformations, as well as
mappings to platform specific extended character sets.

Both XML and BeOS use UTF-8 as their default encoding.

Bert Freudenberg writes:
>On Tue, 14 Sep 1999, Todd Blanchard wrote:
>
>> > On Mon, 13 Sep 1999, Todd Blanchard wrote:
>> >
>> > > I'm wanting to implement some  unicode support.  Who can tell me -
>> > > how big is a word?
>> > > Is it two bytes?
>> >
>> > No, it's four bytes. There is no two-byte primitive supported array in
>> > Squeak (yet).
>>
>> So whats it going to take to get one? Is this something that could
>> be put together by an experienced C programmer with some high-level
>> Smalltalk experience by cloning the variableByteArray class and
>> adjusting the data sizes?
>
>Currently there are only 1-byte arrays (ByteArray) and 4-byte arrays
>(object pointers and words). You would have to find all places that
>accesses the class format and change them to recognize the new 2-byte
>format. These are a lot. Look, for example, into
>Interpreter>>primitiveStringReplace which you certainly would want to use
>for fast Unicode string manipulations.
>
>But basically you could just start using the byte-wise stuff and adjusting
>all sizes by a factor of 2. In #at: you would construct a Unicode
>character from 2 bytes etc. I'd think this would be not even that slow,
>and you could still switch to primitives later.
>
>> Can you point me to info on low-level data formats in squeak?
>
>No ... except for that's all in the image ;-)
>
>I'll copy this back to the list, maybe someone else knows better.
>
>  /bert





More information about the Squeak-dev mailing list