Unicode support
Todd Blanchard
tblanchard at etranslate.com
Thu Sep 23 17:20:54 UTC 1999
> Many of the comments
> made about e.g. the bi-directional nature of Hebrew (which is, I
think, more
> an issue of display rather than storage), the differing word-break
> conventions in other languages, etc indicate to me that String isn't as
> badly broken as some may want to see it, but that subclasses are
needed to
> handle these special cases.
This I agree with. String ought to become an abstract class with a
cluster of special purpose strings below it. The current String
ought to be rechristened SingleByteUnidirectionalString or something.
But I disagree with the statement that bi-directional string
handling is more an issue of display than storage. How can you tell
what display techniques to use on any given String? Right now its
easy - there's only one. But thats not going to be true much longer.
You'll want to pick a font that can handle whats in the string -
how do you do that if all strings are user interpretable byte
streams?
This isn't just a squeak problem either - its world wide. Everybody
stores bytestreams in files or bytestreams in buffers with no meta
information at all about how to interpret the bytestreams. Everybody
just assumes that its ISO-8859-1. That assumption is going to be
wrong very soon. At the very least, we should consider adding some
kind of encoding flag to String so you have a clue what the format of
the stored data is (this could just as easily be a polymorphic
thing). Best is a set of subclasses.
Does anybody know how SmalltalkAgents stored their strings? They
maintained style information such as font color and italics/bold
stuff.
--
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
eTranslate, Inc. The Power of Language
Todd Blanchard main +1.415.487.7850
Chief Technology Architect fax +1.415.371.0010
http://www.etranslate.com/
520 Third Street, Suite 505, San Francisco, California 94107, U.S.A.
More information about the Squeak-dev
mailing list
|