Adding Accufonts to the update stream (was Re: LicencesQuestion : Squeak-L Art 6.)

Ian Piumarta ian.piumarta at inria.fr
Wed Feb 26 05:39:09 UTC 2003


On Wed, 26 Feb 2003, Richard A. O'Keefe wrote:
> 
> UTf-8 decodes into *21-bit* characters.

I may be missing some relevant context in this debate (or if not then
maybe misunderstanding how Unicode works and/or its relationship to the
UCS) but I understood that UTF-8 was defined as an 8-bit transport for the
31-bit UCS (universal character set, ISO-10646) and as such decodes into
31-bit characters.  (The current correspondance between UCS and Unicode is
by design of the respective standards bodies, not because they're the same
thing -- which they aren't.)

  UCS (or Unicode)  UTF-8
  00000000-0000007F 0xxxxxxx
  00000080-000007FF 110xxxxx 10xxxxxx
  00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
  00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
  00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
  04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Regardless of whether there will ever be any _Unicode_ characters assigned
outside the currently-planned 21-bit Unicode limit, UTF-8 does nonetheless
provide for up to 31 bits of charcode since this is the limit for the UCS.  

Pedantically,
Ian



More information about the Squeak-dev mailing list