Adding Accufonts to the update stream (was Re: LicencesQuestion
: Squeak-L Art 6.)
Ian Piumarta
ian.piumarta at inria.fr
Wed Feb 26 05:39:09 UTC 2003
On Wed, 26 Feb 2003, Richard A. O'Keefe wrote:
>
> UTf-8 decodes into *21-bit* characters.
I may be missing some relevant context in this debate (or if not then
maybe misunderstanding how Unicode works and/or its relationship to the
UCS) but I understood that UTF-8 was defined as an 8-bit transport for the
31-bit UCS (universal character set, ISO-10646) and as such decodes into
31-bit characters. (The current correspondance between UCS and Unicode is
by design of the respective standards bodies, not because they're the same
thing -- which they aren't.)
UCS (or Unicode) UTF-8
00000000-0000007F 0xxxxxxx
00000080-000007FF 110xxxxx 10xxxxxx
00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
Regardless of whether there will ever be any _Unicode_ characters assigned
outside the currently-planned 21-bit Unicode limit, UTF-8 does nonetheless
provide for up to 31 bits of charcode since this is the limit for the UCS.
Pedantically,
Ian
More information about the Squeak-dev
mailing list
|