Unicode support

Wed Sep 22 03:36:06 UTC 1999

>Hi,
>
>My main complaint with the design of Strings
[snip]

>...All though there have been some attempts to
>create a memory system that is based on objects and not bytes. As long as
>we are using the popular computer architectures of the day memory will
>continue be allocated in bytes.
>
>I just want Strings to be vastly more object oriented than they currently
>are. Nobody else in our email group seems to support the notion or sees the
>problems that I see with strings being bytes. If you do see these problems
>I'd appreciate hearing from you. From the discussion so far people have
>cited "black box encapsulation" and "protocols" as reasons to justify that
>we don't care about the implentation details of a generic object based
>string class. I say that implementation is very important. While all string
>classes should have a common protocol where it makes sense I don't think
>that this is a very strong argument for keeping a String implementation
>byte oriented instead of object oriented.

Well, I agree, apart from recognizing a constraint
that there's similarity of properties imposed
upon items in a string that doesn't necessarily
hold for an array.  It's like the difference
between a list of atoms a la LISP and a list 
having sublists in some positions.  I don't
mean to say that to be a string the elements
need to be "atoms", but that it seems to 
be they need to be of the same non-abstract type.
By "non-abstract type" I'm attempting to convey
the feeling that somehow a string of things
each having the Object type is not satisfactory.

Perhaps an algebraic analogy could be useful?
Could we say, for instance, that a string is
an array of objects of the same type (subtypes
allowed) for which a particular total ordering
is defined, and for which a comparison operation
is defined such as:

  (1) strings of longer length are "greater
      than" strings of shorter length,

  (2) strings of equal length are decided by
      comparing them lexicographically left-to-
      right (right-to-left might well be 
      another option) using the total ordering
      on the constituent type for pairwise
      comparison of objects.

In st terms, these constraints would be 
realized as methods of the string type.

>
>The space consumption argument against an object oriented String
>implementation is that each character object takes more space. Yes, a
>character object based string would use 4 bytes per string as opposed to 1
>byte per string. This is a definate concern and needs some study to find
>out how much of a concern it is.

Well, unless one is trying to encode a wide
variety of presentations in a fixed and
limited amount of space, like 4 bytes per, it
seems to me that if one is storing more
information per element in an array or,
equivalently, allowing a bigger address span
to be indicated by each element of such 
a "string", paying for that by costing a 
bigger memory consumption seems only proper or
appropriate or real or fair.

>
>Speed issues may be a concern for "complicated byte encodings" like
>"unicode" due to the "magic" that must be done to meet the encoding
>specification with uses special "encodings".

But, the "cost" in speed, if any, could be
well compensated for if the generality is found
to be useful.  Consider the sort, for instance,
which accepts an arbitrary predicate to determine
the outcome of pairwise comparisons.  Such 
flexible definitions often pay for themselves
many times over in development and even
execution costs.

[snip]

  --Jan

>
>All the best,
>
>Peter William Lount
> 
>
>