Unicode support

Peter Smet peter.smet at flinders.edu.au
Wed Sep 22 05:29:56 UTC 1999


Its interesting how difficult it is to really capture the essence of
a String. I agree that the elements of a string must all share some
uniformity. Otherwise you could mix hieroglyphics with ascii.
I think the real problem is that the protocol of a String is completely
determined by its components.

For example, a String of music notes
findInterval: #fifth
a String of DNA base pairs:
findPromoterSequence

A string at its most general is a stream or a hierachy of symbols.
(not Smalltalk #symbols, just symbols)
The type of the symbols determine the protocol of the string.

So maybe a "Collection of uniform objects" is about as
good as a general String can get. Perhaps all Strings
share the idea of the next 'atom' vs the next meaningful
component 'token'? And all implementations appear to
use the Flyweight pattern.

The only other constraint I can find is that the components
of a string are visible - or can be made visible. Not very
useful.

Maybe the String should delegate all specific protocols
to its components, passing itself as a parameter?
For example
<Crappy Code Alert>

String>> findInterval: #fifth
    ^(self first) findInterval: #fifth inString: self

MusicNote>> findInterval: #fifth inString: myContainerString
| last |

last := self.
myContainerString do: [ : each | each fifthHigher == last ifTrue: [^each].
last := each]

Completely untested rambling code - off the top of my head.
Just trying to give an example of how the idea would work.

</Crappy Code Alert>

Does Not Understand would be a good way of generically forwarding the
relevant messages to the components.

It seems to me that the knowledge of what to do lies in the components,
not in their container....

(I'm not even sure if I like this idea, since it puts collection - like
responsibilities in components - anyway feel free to rip it apart)

Peter Smet





More information about the Squeak-dev mailing list