Unicode support

Todd Blanchard tblanchard at etranslate.com
Thu Sep 23 17:20:54 UTC 1999


> Many of the comments
> made about e.g. the bi-directional nature of Hebrew (which is, I  
think, more
> an issue of display rather than storage), the differing word-break 
> conventions in other languages, etc indicate to me that String isn't as 
> badly broken as some may want to see it, but that subclasses are  
needed to
> handle these special cases.

This I agree with.  String ought to become an abstract class with a  
cluster of special purpose strings below it.  The current String  
ought to be rechristened SingleByteUnidirectionalString or something.

But  I disagree with the statement that bi-directional string  
handling is more an issue of display than storage.  How can you tell  
what display techniques to use on any given String? Right now its  
easy - there's only one.  But thats not going to be true much longer.  
 You'll want to pick a font that can handle whats in the string -  
how do  you do that if all strings are user interpretable byte  
streams?

This isn't just a squeak problem either - its world wide.  Everybody  
stores bytestreams in files or bytestreams in buffers with no meta  
information at all about how to interpret the bytestreams. Everybody  
just assumes that its ISO-8859-1.  That assumption is going to be  
wrong very soon.  At the very least, we should consider adding some  
kind of encoding flag to String so you have a clue what the format of  
the stored data is (this could just as easily be a polymorphic  
thing).  Best is a set of subclasses.

Does anybody know how SmalltalkAgents stored their strings?  They  
maintained style information such as font color and italics/bold   
stuff.

--
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
eTranslate, Inc.                                    The Power of Language 
Todd Blanchard                                  main +1.415.487.7850
Chief Technology Architect                      fax +1.415.371.0010
http://www.etranslate.com/
520 Third Street, Suite 505,      San Francisco, California 94107, U.S.A.





More information about the Squeak-dev mailing list