Unicode support

Todd Blanchard tblanchard at etranslate.com
Tue Sep 21 18:21:19 UTC 1999


> From: Marcel Weiher <marcel at metaobject.com>
>
> > From: "Peter William Lount" <peter at smalltalk.org>
> >
> > I agree with you that we shouldn't be concerned with how strings  
store
> > their characters if that's all that is too be stored in a string. 
>
> I don't see why the restriction.  How a string stores whatever it   
> stores is never anybody's business, as with any other object.  Wether   
> it stores character objects, LZW-compressed variable strings, UTF-8,   
> whatever shouldn't matter to its clients.

I agree.  Inisiting that we just have String which is implemented as  
a sequence of Character objects (where Character presumably has  
multiple implementations similar to the way SmallInteger vs  
LargeInteger is handled) is a naive implementation that we are not  
likely to be able to afford.  The system should be able to take  
advantage of space/time optimizations where appropriate.  For  
instance - using a single byte representation when all characters in  
the string are in ISO-8859-1.

>
> > It does
> > mean that strings are based on "byte/double byte encodings" and  
not on
> > general "object oriented" concepts. So we end up with many
> "encoding types"
> > of strings. This is probably necessary given the reality of  
different
> > encoding systems. However, it's not very general. Having an
> GeneralString
> > that is entirely independent of ANY encoding system while being  
able to
> > convert to any encoding system is a very powerful idea.
>
> Yes, having 'GeneralString' as an additional 'encoding' any string   
> is required to be able to convert itself to seems useful.  Once
> again, how this is actually stored is simply none of anybody's
> business.  Adding a class that uses this as its native encoding is   
> also good.  Making this the *only* implementation would be suicide   
> for many applications.

Agreed.


> > Also the GeneralString could hold more than just "characters" if   
> characters
> > are actual objects instead of bytes. Any object, like a icon or   
> graphic,
> > could be put into the string as long as they respond to the  
"character
> > protocol". For example, a HTMLink object might respond with the
> > "characters" that make up the link info. An icon would display   
> itself. An
> > accounting total object would show the "total" as numbers. Any  
of these
> > "character objects" would be able to be linked back to their  
original
> > object - a plain character or a htmlink or an accounting total   
> object - so
> > you can easily create "hyper links" in text.
>
> These shouldn't actually be character objects, but simply formatting   
> objects (more like words than characters, even better would be lists   
> of words). I recently did some experiments with the NSText systems,   
> and found that for many cases the implementation of embedded objects   
> as special characters is not good enough.  One problem is that single   
> objects may represent multiple words in the output, which would have   
> to be line-wrapped etc.  While it is possible to fake this with
> NSText, it is a lot more convoluted than it should be.
>
> Equating "Text" with a series of characters is the fundamental
> problem.  It is a series of objects, some of which may be represented   
> words which may actually consist of characters (rough
> approximation).  Introducing "SuperCharacters" doesn't solve the   
> fundamental problem of treating text as a sequence of characters.    
> That doesn't mean that it isn't appropriate in many situations.
>
> > NeXTStep/OpenStep (now Apple) has an amazing Text and Character  
system.
> > There is no doubt that they have done their homework very well.   
> They have
> > an Attributed String object that performs some of the above
> functions. Any
> > professional text system should have at least the capabilities  
of the
> > OpenStep text system.
>
> Yes, that is definitely a minimum standard.  However, there are many   
> points where it needs to be improved.  Another example where Apple's   
> text system is poor is the handling of very large texts.  For these   
> sorts of situations, it should provide a much more simplified and   
> less resource intensive configuration.

Apple's or Next's?  Next's seems pretty good but it only handles  
some european
encodings and Japanese.  No Chinese, Korean, Vietnamese or any other  
asian language.

I want to handle all languages equally well. (Yeah, I know, tricky).



--
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
eTranslate, Inc.                                    The Power of Language 
Todd Blanchard                                  main +1.415.487.7850
Chief Technology Architect                      fax +1.415.371.0010
http://www.etranslate.com/
520 Third Street, Suite 505,      San Francisco, California 94107, U.S.A.





More information about the Squeak-dev mailing list