Unicode support
Todd Blanchard
tblanchard at etranslate.com
Tue Sep 21 18:21:19 UTC 1999
> From: Marcel Weiher <marcel at metaobject.com>
>
> > From: "Peter William Lount" <peter at smalltalk.org>
> >
> > I agree with you that we shouldn't be concerned with how strings
store
> > their characters if that's all that is too be stored in a string.
>
> I don't see why the restriction. How a string stores whatever it
> stores is never anybody's business, as with any other object. Wether
> it stores character objects, LZW-compressed variable strings, UTF-8,
> whatever shouldn't matter to its clients.
I agree. Inisiting that we just have String which is implemented as
a sequence of Character objects (where Character presumably has
multiple implementations similar to the way SmallInteger vs
LargeInteger is handled) is a naive implementation that we are not
likely to be able to afford. The system should be able to take
advantage of space/time optimizations where appropriate. For
instance - using a single byte representation when all characters in
the string are in ISO-8859-1.
>
> > It does
> > mean that strings are based on "byte/double byte encodings" and
not on
> > general "object oriented" concepts. So we end up with many
> "encoding types"
> > of strings. This is probably necessary given the reality of
different
> > encoding systems. However, it's not very general. Having an
> GeneralString
> > that is entirely independent of ANY encoding system while being
able to
> > convert to any encoding system is a very powerful idea.
>
> Yes, having 'GeneralString' as an additional 'encoding' any string
> is required to be able to convert itself to seems useful. Once
> again, how this is actually stored is simply none of anybody's
> business. Adding a class that uses this as its native encoding is
> also good. Making this the *only* implementation would be suicide
> for many applications.
Agreed.
> > Also the GeneralString could hold more than just "characters" if
> characters
> > are actual objects instead of bytes. Any object, like a icon or
> graphic,
> > could be put into the string as long as they respond to the
"character
> > protocol". For example, a HTMLink object might respond with the
> > "characters" that make up the link info. An icon would display
> itself. An
> > accounting total object would show the "total" as numbers. Any
of these
> > "character objects" would be able to be linked back to their
original
> > object - a plain character or a htmlink or an accounting total
> object - so
> > you can easily create "hyper links" in text.
>
> These shouldn't actually be character objects, but simply formatting
> objects (more like words than characters, even better would be lists
> of words). I recently did some experiments with the NSText systems,
> and found that for many cases the implementation of embedded objects
> as special characters is not good enough. One problem is that single
> objects may represent multiple words in the output, which would have
> to be line-wrapped etc. While it is possible to fake this with
> NSText, it is a lot more convoluted than it should be.
>
> Equating "Text" with a series of characters is the fundamental
> problem. It is a series of objects, some of which may be represented
> words which may actually consist of characters (rough
> approximation). Introducing "SuperCharacters" doesn't solve the
> fundamental problem of treating text as a sequence of characters.
> That doesn't mean that it isn't appropriate in many situations.
>
> > NeXTStep/OpenStep (now Apple) has an amazing Text and Character
system.
> > There is no doubt that they have done their homework very well.
> They have
> > an Attributed String object that performs some of the above
> functions. Any
> > professional text system should have at least the capabilities
of the
> > OpenStep text system.
>
> Yes, that is definitely a minimum standard. However, there are many
> points where it needs to be improved. Another example where Apple's
> text system is poor is the handling of very large texts. For these
> sorts of situations, it should provide a much more simplified and
> less resource intensive configuration.
Apple's or Next's? Next's seems pretty good but it only handles
some european
encodings and Japanese. No Chinese, Korean, Vietnamese or any other
asian language.
I want to handle all languages equally well. (Yeah, I know, tricky).
--
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
eTranslate, Inc. The Power of Language
Todd Blanchard main +1.415.487.7850
Chief Technology Architect fax +1.415.371.0010
http://www.etranslate.com/
520 Third Street, Suite 505, San Francisco, California 94107, U.S.A.
More information about the Squeak-dev
mailing list
|