String hierarchy (was: UTC-8 (was ...))
m3rabb at stono.com
Fri Mar 17 01:19:15 UTC 2000
At 2:12 PM -0800 3/16/00, Dan Ingalls wrote:
>AGREE at CarltonFields.com wrote...
> >Of course it ain't trivial, but perhaps there's an interim, if not
>ad hoc solution that serves every relevant purpose? It seems to me
>that the Number hierarchy is proof positive that widely disparate,
>differently sized and incomparable models with similar features can
>be resolved into a seamless whole.
> >In a sense, isn't a pure ASCII string just a subset of UTC-8?
>Can't a hierarchy with built-in coercion be used to preserve ALL of
>the efficiencies of the status quo, while still permitting (or at
>least paving the way) toward the full generality of UTC-8 and
> >Why can't the ASCII string be the SmallInteger of a new
>STRINGTHING hierarchy, where operations within the string world be
>seamless? Every time I raise this point, there were countless
>objections about things Squeak so configured could not do (the
>biggest deal was auto-reversing Hebrew/Anglo-Numeric text), but it
>seems that we could still accomodate many of the advantages of
>Unicode, integrate the whole into Squeak, while preserving ALL of
>the efficiencies of the present ASCII world for unmixed ASCII and
>I agree with this approach entirely. It's a great Squeak Samuri
>project (I would do it tonight, but I've got a hot date ;-). Just
>put StringThing between ArrayedCollection and String, move all of
>String's methods up a level, leaving only those that have to do with
>String's primitive behavior. It shouldn't take more than an hour,
>and everything should still work.
>Then... define, say, String16 (*) that uses 16 bits and produces
>characters with codes up to 65535. Make one up like 'Squ<999>eak',
>and see if it prints. Then see if it displays. Etc. Lots of
>things will break, but that's half the fun. You'll find out if text
>display handles characters that are not in the font, and you'll have
>to decide whether all characters will still be unique, but this is
>what life on the frontier is all about.
>When in doubt, try it out.
> - Dan
>(*) It's probably worth starting with the most general expansion
>first. Then from there on, it's only optimization and engineering
>to do the others -- the interfaces will have all been worked out.
>PS: I'm not saying SqC will embrace unicode, I'm just saying that
>it may only take a couple of days to understand most of what is
I know that this may be viewed as blasphemy, but this is another
compelling reason that String should be removed from the Collection
hierarchy. IMHO, the continued inclusion of String in the Collection
hierarchy is a serious mistake that continues to beget problems.
Including in the Collection hierarchy not only reveals its
implementation but forces its type. It forces an "is-a" relationship
instead of a more appropriate "has-a" relationship. Though strings
often _act_ as collections, they are more than just collections. All
that should matter is that strings should be able to answer
aSequenceableCollection of its contents when the appropriate message
is sent; e.g. #characters|#elements|#contents.
(Does Kent Beck have any thoughts on this?)
I first began to wonder about the location of String in the class
hierarchy when considering all of the special methods used to prevent
accidentally enumerating a string instead of treating it as a
singular object. I became convinced of the problem when trying to
expand the behavior of string types.
String <indexed bytes>
The implementation optimization for its origin are obvious, however
the current implementation's rigidity complicates appropriate design
in other aspects of string use. Appropriate use of protocol is
tantamount to good design.
The current implementation makes it difficult to:
- Allow strings to use self managing compression;
- Allow symbols to remove or obscure there contents;
- Allow symbols to cache a (better) hash value;
- Allow selectors to have direct references to synonyms or other
(BTW, I am aware that the Squeak VM allows you to use SmallIntegers
in place of selectors. That is orthogonal to my intentions.)
(The last item is useful for efficiently implementing multi-dispatch
messaging in deep class subhierarchies. each selector can hold a
reference to the next most general selector with a supertype name
embedded in it. This prevents string character manipulation or
concatenation, and symbol identity/existence table lookup.)
Ideally String would have one ivar 'contents' which would delegate
its representation to an arrayed or encoding object.
I recognize the importance, and high degree of interdependence of
String in Smalltalk. Moving String from under Collection is not that
difficult, however finding every place that a string is used as a
collection is non-trivial. (What is the best/easiest way to do
this?) Initially the collection protocols that are used by strings
could be copied to String in it new place in the hierarchy. All such
methods would be commented as being discouraged, and recommend the
use of the idioms: 'aString contents someCollectionMethod', or
'aString contentsDo: aBlock'. (#contents, #characters, whatever!)
After a few revs of being weaned, perhaps we could eliminate the
direct collection protocols from String.
In the meantime, I agree that changing String within the Collection
hierarchy will be the easiest way to solve the element encoding
At 5:46 PM -0500 3/16/00, Doug Way wrote:
>Sounds great to me, too. Except maybe call the new class something other
>than StringThing... maybe "AbstractString" might be most appropriate?
>(Naming is important, y'know... :-))
I agree. Naming is very important. Arguably, the _most_ important thing.
Whatever you do, please, please, please!!! name the abstract string
class String. I know that it involves extra steps, but IMHO would be
best to keep the name pure.
Whatever the intermediate classes you use or don't use, push the
primitive string calls into AsciiString.
Maurice Rabb 773.281.6003 Stono Technologies, LLC Chicago, USA
More information about the Squeak-dev