Unicode support
Andrew C. Greenberg
werdna at gate.net
Wed Sep 22 00:15:29 UTC 1999
>I dont understand how an Array is useful as a general String. They are both
>Collections, and that is about it.
You wanted something that could maintain and manipulate a sequence of
randomly accessed, but generalized objects. An Array seems the
broadest non-abstract class in the hierarchy that does this.
My question, of course is this. What is a GeneralizedString other
than an Array of objects? Perhaps it is that the collection is all
of a generally homogenous class, say, of instances of a subclass of
GeneralizedCharacter? Perhaps we will require ALL characters of the
array to be instances of one particular class? What is it about the
GeneralizedCharacter that distinguishes it from, say, Object, clearly
the most general version? Perhaps it is:
(1) All instances share exactly the same protocol, which
includes a particulaized minimum protocol?
(2) Perhaps there is a abstract mechanism for a property list
(isUpper, isLower) or a conversion list (asUpper)?
(3) The class is flyweight, or perhaps that = is the same as ==?
I really don't know -- I'm just trying to get the ball rolling.
>Or is this your point - what protocol differentiates Strings from Arrays?
Yes.
>Maybe the only responsibility of the general StringClass should be as a
>Convertor:
>String>>
>as: aCharacterSet
>asUnicode
>asAscii
>etc.....
>
>The reason I say this is that a common protocol for all Strings is a tough
>ask.
Why? If it is, then maybe we don't yet understand strings well
enough to explain why they should be generalized? You say that Array
isn't enough, but String is too much. Where, then, does this belong?
Or, is it not in fact the String that is the crux of the matter, but
the Character that must be generalized? Or is there some interaction
between a String and a probably lightweight (Perhaps Flyweight?)
class that defines our notion of a String?
Given the beautifully parameterized structure of the collection class
filtering enumerations, it seems to me that a powerful
GeneralizedString class can be built, so that it can leverage the
protocol of the underlying character objects. When expressions
become so common as to make it useful to have special-cased methods,
but not enough to justify putting it in the GeneralizedString class,
we have just identified a reason to extend the hierarchy to include a
new subclass or subclasses.
>As people point out, even asUppercase won't hold for many languages.
>You may end up with a restricted and fairly useless protocol.
>With the 'asUnicode' approach, the client determines which protocol will
>be used to communicate with Strings. So a String has many possible
>interfaces (protocols), and the client chooses an appropriate interface
>by which to manipulate the String.
Actually, I think this is precisely the point. Either you have a
general class or you don't (Or Array really is it). Let's start with
the most fundamental objects, and try to find a way to capture unique
features in particular encodings and protcols for the
GeneralizedCharacter, and ways to reach those features through the
GeneralizedString protocol.
We cannot at once say that it is important to create a generalized
string and that it is impossible. Let's either do it or drop it.
More information about the Squeak-dev
mailing list
|