String hierarchy (was: UTC-8 (was ...))

Maurice Rabb m3rabb at stono.com
Fri Mar 17 09:33:20 UTC 2000


At 10:18 PM -0500 3/16/00, Bijan Parsia wrote:
[snip a lot of interesting thoughts]

You know what?  My frustrations with the implementation straight 
jacket that String and Symbol currently have us in (due to being 
variableByteSubclasses) has me wanting to throw the baby out with the 
bathwater.

Inheritance of protocol is an acceptable reason to inherit from a 
supertype, provided it truly should implement the entire protocol of 
the supertype.  Is this true of String?  Yes [blush].  I also agree 
that it is proper for it implement the entire SequenceableCollection 
protocol.  However, it is debatable whether or not it should share 
the behavior of ArrayedCollection.

As Alan (et al) stated, aspects will alleviate many of the questions 
about what should inherit from what.


I do stand by my point that _ideally_ String should not intrinsically 
hold the its elements, but instead delegate to a 'contents' ivar. 
This would allow its implementation to be manage transparently (and 
dynamically) in many ways.


At 4:51 PM +1300 3/17/00, Richard A. O'Keefe wrote:
>If you stored a wide character into a thin string, it magically became (think
>'become:') a fat string.
>
>There is, in short, no reason why Smalltalk application programmers
>should *see* any difference between thin (Latin 1, say) and fat (Unicode/
>ISO 10646 Base Mode Plane, say) and really obese (full ISO 10646 31-bit
>characters) strings at all.  String *encoding* is a matter for input and
>output.  Take a look at how Plan 9 did/does it and how Java does it.
>
>Logically, a Smalltalk string should be a sequence of characters in the
>supported character set, whatever that is.  How it is packed is an
>implementation detail, and that's what encapsulation is for.
>
>The thing we _do_ need is to have '16-bit byte' arrays supported just like
>'8-bit byte arrays', to serve as substrate for the String implementation.
>
>You see, once you start including all sorts of encodings as different
>kinds of String, you have to start worrying about what it means for one
>String to equal another.  (Not that Unicode actually makes that easy anyway;
>there are several congruence relations defined on Unicode strings and the
>really useful ones are not identical-as-code-sequence.)

Exactly!  The substrate would be a string's 'contents'.  A string 
should delegate the encoding the underlying ByteArray, WordArray, or 
other means of encoding.


Again however, the VM can't yet handle strings and symbols that are 
not byte based objects.

Also again, IMHO it would be best if the abstract String class is named String.


At 10:18 PM -0500 3/16/00, Bijan Parsia wrote:
> > In the meantime, I agree that changing String within the Collection
> > hierarchy will be the easiest way to solve the element encoding
> > problem.
>Cool! So, we can continue the theorectical discussion without destroying
>coding initiatives! :)

:-)  Real work must always continue!

--Maurice


---------------------------------------------------------------------------
   Maurice Rabb    773.281.6003    Stono Technologies, LLC    Chicago, USA





More information about the Squeak-dev mailing list