<div dir="ltr">Hi All,<div><br></div><div>    anyone who really understands Unicode care to write a reasonable comment for Spur CHaracter, given that it is immediate?<br clear="all"><div><br></div><div>This is the existing one:</div>

<div>&quot;I represent a character by storing its associated Unicode. The first 256 characters are created uniquely, so that all instances of latin1 characters ($R, for example) are identical.</div><div><br></div><div><span class="" style="white-space:pre">        </span>The code point is based on Unicode.  Since Unicode is 21-bit wide character set, we have several bits available for other information.  As the Unicode Standard  states, a Unicode code point doesn&#39;t carry the language information.  This is going to be a problem with the languages so called CJK (Chinese, Japanese, Korean.  Or often CJKV including Vietnamese).  Since the characters of those languages are unified and given the same code point, it is impossible to display a bare Unicode code point in an inspector or such tools.  To utilize the extra available bits, we use them for identifying the languages.  Since the old implementation uses the bits to identify the character encoding, the bits are sometimes called &quot;encoding tag&quot; or neutrally &quot;leading char&quot;, but the bits rigidly denotes the concept of languages.</div>

<div><br></div><div><span class="" style="white-space:pre">        </span>The other languages can have the language tag if you like.  This will help to break the large default font (font set) into separately loadable chunk of fonts.  However, it is open to the each native speakers and writers to decide how to define the character equality, since the same Unicode code point may have different language tag thus simple #= comparison may return false.</div>

<div><br></div><div>I represent a character by storing its associated ASCII code (extended to 256 codes). My instances are created uniquely, so that all instances of a character ($R, for example) are identical.&quot;</div>

<div><br></div><div>I can go as far as</div><div>&quot;I represent characters, encoding a unicode character code as an immediate object (an object whose value is encoded in a pointer).  There can be up to 2^30 characters.  All are unique.&quot;</div>

<div>perhaps someone else can do better...</div><div><br></div><div>c.f. SmallInteger&#39;s comment:</div><div>&quot;My instances are 31-bit numbers, stored in twos complement form. The allowable range is approximately +- 1 billion (see SmallInteger minVal, maxVal).&quot;<br>

</div>-- <br>best,<div>Eliot</div>

</div></div>