String hierarchy (was: UTC-8 (was ...))
Maurice Rabb
m3rabb at stono.com
Fri Mar 17 09:33:20 UTC 2000
At 10:18 PM -0500 3/16/00, Bijan Parsia wrote:
[snip a lot of interesting thoughts]
You know what? My frustrations with the implementation straight
jacket that String and Symbol currently have us in (due to being
variableByteSubclasses) has me wanting to throw the baby out with the
bathwater.
Inheritance of protocol is an acceptable reason to inherit from a
supertype, provided it truly should implement the entire protocol of
the supertype. Is this true of String? Yes [blush]. I also agree
that it is proper for it implement the entire SequenceableCollection
protocol. However, it is debatable whether or not it should share
the behavior of ArrayedCollection.
As Alan (et al) stated, aspects will alleviate many of the questions
about what should inherit from what.
I do stand by my point that _ideally_ String should not intrinsically
hold the its elements, but instead delegate to a 'contents' ivar.
This would allow its implementation to be manage transparently (and
dynamically) in many ways.
At 4:51 PM +1300 3/17/00, Richard A. O'Keefe wrote:
>If you stored a wide character into a thin string, it magically became (think
>'become:') a fat string.
>
>There is, in short, no reason why Smalltalk application programmers
>should *see* any difference between thin (Latin 1, say) and fat (Unicode/
>ISO 10646 Base Mode Plane, say) and really obese (full ISO 10646 31-bit
>characters) strings at all. String *encoding* is a matter for input and
>output. Take a look at how Plan 9 did/does it and how Java does it.
>
>Logically, a Smalltalk string should be a sequence of characters in the
>supported character set, whatever that is. How it is packed is an
>implementation detail, and that's what encapsulation is for.
>
>The thing we _do_ need is to have '16-bit byte' arrays supported just like
>'8-bit byte arrays', to serve as substrate for the String implementation.
>
>You see, once you start including all sorts of encodings as different
>kinds of String, you have to start worrying about what it means for one
>String to equal another. (Not that Unicode actually makes that easy anyway;
>there are several congruence relations defined on Unicode strings and the
>really useful ones are not identical-as-code-sequence.)
Exactly! The substrate would be a string's 'contents'. A string
should delegate the encoding the underlying ByteArray, WordArray, or
other means of encoding.
Again however, the VM can't yet handle strings and symbols that are
not byte based objects.
Also again, IMHO it would be best if the abstract String class is named String.
At 10:18 PM -0500 3/16/00, Bijan Parsia wrote:
> > In the meantime, I agree that changing String within the Collection
> > hierarchy will be the easiest way to solve the element encoding
> > problem.
>Cool! So, we can continue the theorectical discussion without destroying
>coding initiatives! :)
:-) Real work must always continue!
--Maurice
---------------------------------------------------------------------------
Maurice Rabb 773.281.6003 Stono Technologies, LLC Chicago, USA
More information about the Squeak-dev
mailing list
|