Unicode support

Jerome Garcia jgarcia at svg.com
Wed Sep 22 17:56:24 UTC 1999



>>> <agree at carltonfields.com> 09/22 10:37 AM >>>

> -----Original Message-----
> From: MIME :rowledge at interval.com > Sent: Wednesday, September 22,
1999 12:01 PM
> To: squeak at cs.uiuc.edu 
> Subject: Re: Unicode support
> > > On Tue 21 Sep, Andrew C. Greenberg wrote:
> > > A newbie recently asked how to compute the equivalent of:
> > > 	word 4 of line 7
> > > and
> > > 	set word 4 of line 7 to "foobar"
> I haven't been tracking this discussion too thoroughly, but > the
above point
> filtered through even though I haven't yet been caffeinated > this
morning.
> My claim is that the above sorts of action have nothing to do >
with String.
> Strings do not have lines, nor even words. Paragraphs (or just
maybe
> FormattedSentences) have lines and words. A String is just a > long
list of
> characters. Linebreaks, words etc only have meaning once the string
is
> formatted as part of a larger document-like concept.

>
This makes sense to me.  However, there exist certain underlying
operations that may be performed on strings to facilitate such
computations that may well be string-like.  I raised this example to
investigate what those underlying operations are or should be (beyond
the obvious single-character and substring reads and writes).  Should
indexing be a part of the protocol?  Searching?  (that is, beyond the
general collection facilities)?  How about tokenizing with respect to
certain delimiters (or predicates) and related operations?

While I agree that "words" per se, are a semantic or syntactic notion
not inherent in the mere linear aggregation of characters; perhaps
less structure-imposing operations, such as the tokenizing operations
are appropriate?
>

I would like to agree with both the points made here. A string is
just a long list of characters but there should be powerful but
non-structure imposing operations available such as tokenization. A
couple of examples of this power could be found in the 1985 Standard
COBOL (a ghost from my past) which provided the powerful and quite
flexible STRING and UNSTRING operations which made no assumptions
about the structure of the string but made it quite easy to put
together substrings and tokenize strings.

Jerome E. Garcia





More information about the Squeak-dev mailing list