UTC-8 (was Re: Celeste encoding (was: Duplicate messages in Celeste))
Marcel Weiher
marcel at metaobject.com
Thu Mar 16 21:44:23 UTC 2000
> From: AGREE at CarltonFields.com
>
> In a sense, isn't a pure ASCII string just a subset of UTC-8?
Yes, and the beauty of it is that (a) all the characters relevant to
understanding XML structure fall within ASCII and (b) no plain ASCII
character codes are used in UTF-8 multi-byte escapes. So, you can
simply ignore any UTF-8 issues for the parser itself, but the content
it delivers won't be normalized.
> Can't a hierarchy with
> built-in coercion be used to preserve ALL of the efficiencies of
the status quo,
> while still permitting (or at least paving the way) toward the
full generality of
> UTC-8 and Unicode?
Yes.
> Why can't the ASCII string be the SmallInteger of a new
STRINGTHING hierarchy,
> where operations within the string world be seamless?
Yup.
> Every time I raise this
> point, there were countless objections about things Squeak so
configured could
> not do (the biggest deal was auto-reversing Hebrew/Anglo-Numeric text),
Except that these problems are at a higher level, when dealing with
words. Strings really just deal with characters and have no idea
about languages.
> but it
> seems that we could still accomodate many of the advantages of Unicode,
> integrate the whole into Squeak, while preserving ALL of the
efficiencies of the
> present ASCII world for unmixed ASCII and Character stuff.
Exactly.
> Or at least we should try real hard to think (or hack) through the
question before
> doing nothing because of an apparent lack of purity.
If it helps, I can probably provide class-documentation of the
NeXT/Apple NSString class-cluster, which does exactly that,
successfully.
Marcel
More information about the Squeak-dev
mailing list
|