[squeak-dev] Suitable immediate Character class comment?

Eliot Miranda eliot.miranda at gmail.com
Sun Jun 22 04:05:24 UTC 2014


Hi All,

    anyone who really understands Unicode care to write a reasonable
comment for Spur CHaracter, given that it is immediate?

This is the existing one:
"I represent a character by storing its associated Unicode. The first 256
characters are created uniquely, so that all instances of latin1 characters
($R, for example) are identical.

The code point is based on Unicode.  Since Unicode is 21-bit wide character
set, we have several bits available for other information.  As the Unicode
Standard  states, a Unicode code point doesn't carry the language
information.  This is going to be a problem with the languages so called
CJK (Chinese, Japanese, Korean.  Or often CJKV including Vietnamese).
 Since the characters of those languages are unified and given the same
code point, it is impossible to display a bare Unicode code point in an
inspector or such tools.  To utilize the extra available bits, we use them
for identifying the languages.  Since the old implementation uses the bits
to identify the character encoding, the bits are sometimes called "encoding
tag" or neutrally "leading char", but the bits rigidly denotes the concept
of languages.

The other languages can have the language tag if you like.  This will help
to break the large default font (font set) into separately loadable chunk
of fonts.  However, it is open to the each native speakers and writers to
decide how to define the character equality, since the same Unicode code
point may have different language tag thus simple #= comparison may return
false.

I represent a character by storing its associated ASCII code (extended to
256 codes). My instances are created uniquely, so that all instances of a
character ($R, for example) are identical."

I can go as far as
"I represent characters, encoding a unicode character code as an immediate
object (an object whose value is encoded in a pointer).  There can be up to
2^30 characters.  All are unique."
perhaps someone else can do better...

c.f. SmallInteger's comment:
"My instances are 31-bit numbers, stored in twos complement form. The
allowable range is approximately +- 1 billion (see SmallInteger minVal,
maxVal)."
-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20140621/8124b06c/attachment.htm


More information about the Squeak-dev mailing list