Unicode support

Wed Sep 22 16:37:30 UTC 1999

> -----Original Message-----
> From: MIME :rowledge at interval.com > Sent: Wednesday, September 22, 1999 12:01 PM
> To: squeak at cs.uiuc.edu
> Subject: Re: Unicode support
> > > On Tue 21 Sep, Andrew C. Greenberg wrote:
> > > A newbie recently asked how to compute the equivalent of:
> > > 	word 4 of line 7
> > > and
> > > 	set word 4 of line 7 to "foobar"
> I haven't been tracking this discussion too thoroughly, but > the above point
> filtered through even though I haven't yet been caffeinated > this morning.
> My claim is that the above sorts of action have nothing to do > with String.
> Strings do not have lines, nor even words. Paragraphs (or just maybe
> FormattedSentences) have lines and words. A String is just a > long list of
> characters. Linebreaks, words etc only have meaning once the string is
> formatted as part of a larger document-like concept.

This makes sense to me.  However, there exist certain underlying operations that may be performed on strings to facilitate such computations that may well be string-like.  I raised this example to investigate what those underlying operations are or should be (beyond the obvious single-character and substring reads and writes).  Should indexing be a part of the protocol?  Searching?  (that is, beyond the general collection facilities)?  How about tokenizing with respect to certain delimiters (or predicates) and related operations?

While I agree that "words" per se, are a semantic or syntactic notion not inherent in the mere linear aggregation of characters; perhaps less structure-imposing operations, such as the tokenizing operations are appropriate?