Unicode support

Andrew C. Greenberg werdna at gate.net
Wed Sep 22 00:15:29 UTC 1999


>I dont understand how an Array is useful as a general String. They are both
>Collections, and that is about it.

You wanted something that could maintain and manipulate a sequence of 
randomly accessed, but generalized objects.  An Array seems the 
broadest non-abstract class in the hierarchy that does this.

My question, of course is this.  What is a GeneralizedString other 
than an Array of objects?  Perhaps it is that the collection is all 
of a generally homogenous class, say, of instances of a subclass of 
GeneralizedCharacter?  Perhaps we will require ALL characters of the 
array to be instances of one particular class?  What is it about the 
GeneralizedCharacter that distinguishes it from, say, Object, clearly 
the most general version?  Perhaps it is:

	(1) All instances share exactly the same protocol, which 
includes a particulaized minimum protocol?
	(2) Perhaps there is a abstract mechanism for a property list 
(isUpper, isLower) or a conversion list (asUpper)?
	(3) The class is flyweight, or perhaps that = is the same as ==?

I really don't know -- I'm just trying to get the ball rolling.

>Or is this your point - what protocol differentiates Strings from Arrays?

Yes.

>Maybe the only responsibility of the general StringClass should be as a
>Convertor:
>String>>
>as: aCharacterSet
>asUnicode
>asAscii
>etc.....
>
>The reason I say this is that a common protocol for all Strings is a tough
>ask.

Why?  If it is, then maybe we don't yet understand strings well 
enough to explain why they should be generalized?  You say that Array 
isn't enough, but String is too much.  Where, then, does this belong?

Or, is it not in fact the String that is the crux of the matter, but 
the Character that must be generalized?  Or is there some interaction 
between a String and a probably lightweight (Perhaps Flyweight?) 
class that defines our notion of a String?

Given the beautifully parameterized structure of the collection class 
filtering enumerations, it seems to me that a powerful 
GeneralizedString class can be built, so that it can leverage the 
protocol of the underlying character objects.  When expressions 
become so common as to make it useful to have special-cased methods, 
but not enough to justify putting it in the GeneralizedString class, 
we have just identified a reason to extend the hierarchy to include a 
new subclass or subclasses.

>As people point out, even asUppercase won't hold for many languages.
>You may end up with a restricted and fairly useless protocol.
>With the 'asUnicode' approach, the client determines which protocol will
>be used to communicate with Strings. So a String has many possible
>interfaces (protocols), and the client chooses an appropriate interface
>by which to manipulate the String.

Actually, I think this is precisely the point.  Either you have a 
general class or you don't (Or Array really is it).  Let's start with 
the most fundamental objects, and try to find a way to capture unique 
features in particular encodings and protcols for the 
GeneralizedCharacter, and ways to reach those features through the 
GeneralizedString protocol.

We cannot at once say that it is important to create a generalized 
string and that it is impossible.  Let's either do it or drop it.





More information about the Squeak-dev mailing list