Unicode support

Andrew C. Greenberg werdna at gate.net
Wed Sep 22 06:09:06 UTC 1999


I agree with Peter's rough cuts:

	(1) Strings are somewhat homogenous;
	(2) Strings are comprised of elements of a Flyweight pattern class;
	(3) Much of the "usual" things we do with a string are 
determined by the underlying character class.

Perhaps we can study what things we really do with strings, at least 
to enumerate them, and see how they matter.

What do strings do?  What must they do?  Which of the following are 
necessary, mandatory, or even useful?  Which depend upon the class?

	(A) support a (partial;total) linear ordering (=,<,>)
	(B) substring
	(C) catenation
	(D) indexing
	(E) collecting, selecting, allAre, someIs, based upon blocks 
that are passed (conceptually) to individual objects.
	(F) clumping and declumping (word-based, delimiter-based, token-based)
	(G) random access out (not as special case of substring, but pulling
		the character object out of the class as well.
	(H) random access in (not as special case of catentation, but 
validating
		and/or coercing the character on the way in.
	(I) notion and creation of a null string as identity operator for
		many of the preceding operations
	(J) searching for substrings.
	(K) sizing

I don't know that I agree with those who believe that a string must 
be "growable/shrinkable"  Perhaps we should consider making strings 
that are length-immutable, per Python.  Doing so can facilitate other 
of what I have now come to think of as string-like operations, in 
particular, slicing and index-shifting, by creation of proxy objects 
on the original string, which share changes to the underlying 
content.  Size-shifting operations can be permitted, but create 
copies of the original, changes to which do not impact the content of 
already-taken slices.

A newbie recently asked how to compute the equivalent of:

	word 4 of line 7

and

	set word 4 of line 7 to "foobar"

Should these sort of operations, presumably defined somehow in terms 
of operations on the underlying Character class be abstracted, or are 
there orthogonal operations that provide access to the functionality 
that should be better preserved.

Is the Applescript container notion, however fluffy, perhaps a better 
model than we have considered to date?  Is it important to a 
programming language?  Can it be efficient ever?

it's late, and I'm babbling.   Time to go to bed.





More information about the Squeak-dev mailing list