Unicode support

Jarvis, Robert P. Jarvisb at timken.com
Wed Sep 22 12:11:33 UTC 1999


It seems that what's being attempted here is to create a monster String
class which can do anything.  I don't think that's what String is intended
to be.  Let's review the class comment for String:

	A String is an indexed collection of Characters, compactly encoded
as 8-bit bytes.

	String support a vast array of useful methods, which can best be
learned by browsing and trying out examples as you find them in the code.

	Here are a few useful methods to look at...
		String match:
		String contractTo:

	String also inherits many useful methods from its hierarchy, such as
		SequenceableCollection ,
		SequenceableCollection copyReplaceAll:with:

String is not intended to be a collection of DNA base pairs (DnaSequence?),
or a collection of musical notes (Score?), or a collection of other
arbitrary objects (OrderedCollection?  Array?  Dictionary?).  If you need a
collection of DNA base pairs with specific new behavior, bite the bullet and
subclass the appropriate Collection class, add your specific behavior, and
move on.  Ditto for musical notes.  Arguably ditto for collections of
Unicode/hieroglyphic/whatever characters.  Just my opinion.

Bob Jarvis
The Timken Company

> -----Original Message-----
> From:	Peter Smet [SMTP:peter.smet at flinders.edu.au]
> Sent:	Wednesday, September 22, 1999 1:30 AM
> To:	squeak at cs.uiuc.edu
> Subject:	Re: Unicode support
> 
> Its interesting how difficult it is to really capture the essence of
> a String. I agree that the elements of a string must all share some
> uniformity. Otherwise you could mix hieroglyphics with ascii.
> I think the real problem is that the protocol of a String is completely
> determined by its components.
> 
> For example, a String of music notes
> findInterval: #fifth
> a String of DNA base pairs:
> findPromoterSequence
> 
> A string at its most general is a stream or a hierachy of symbols.
> (not Smalltalk #symbols, just symbols)
> The type of the symbols determine the protocol of the string.
> 
> So maybe a "Collection of uniform objects" is about as
> good as a general String can get. Perhaps all Strings
> share the idea of the next 'atom' vs the next meaningful
> component 'token'? And all implementations appear to
> use the Flyweight pattern.
> 
> The only other constraint I can find is that the components
> of a string are visible - or can be made visible. Not very
> useful.
> 
> Maybe the String should delegate all specific protocols
> to its components, passing itself as a parameter?
> For example
> <Crappy Code Alert>
> 
> String>> findInterval: #fifth
>     ^(self first) findInterval: #fifth inString: self
> 
> MusicNote>> findInterval: #fifth inString: myContainerString
> | last |
> 
> last := self.
> myContainerString do: [ : each | each fifthHigher == last ifTrue: [^each].
> last := each]
> 
> Completely untested rambling code - off the top of my head.
> Just trying to give an example of how the idea would work.
> 
> </Crappy Code Alert>
> 
> Does Not Understand would be a good way of generically forwarding the
> relevant messages to the components.
> 
> It seems to me that the knowledge of what to do lies in the components,
> not in their container....
> 
> (I'm not even sure if I like this idea, since it puts collection - like
> responsibilities in components - anyway feel free to rip it apart)
> 
> Peter Smet
> 
> 
> 
> 
> 
> 
> 





More information about the Squeak-dev mailing list