[squeak-dev] ByteString vs EncodedString vs ByteArray (was Re:
leadingChar proposal)
Colin Putney
cputney at wiresong.ca
Fri Aug 28 13:28:20 UTC 2009
On 28-Aug-09, at 1:09 AM, Bert Freudenberg wrote:
> Wouldn't ByteArrays be a better way to efficiently store arrays of
> bytes? Strings are conceptually made of Characters, and there are
> more than 256 of them. E.g. a la Python 3:
So you're proposing that WideString, once it no longer has language
tags, use its 4 bytes per character to point to Character objects
rather than encoding the string at all? That would certainly be an
interesting implementation. It would trade space for speed (of certain
operations) in the case of CJK and other writing systems that involve
large numbers of characters, as you'd have a bunch of Character
objects persisting in the image, rather than just ephemerally. For
some applications, that's exactly the right design choice, no doubt.
On the other hand EncodedString (and subclasses like Utf8String or
Latin1String) would make a different trade-off, speed (of certain
operations) for space. Any #variableByteSubclass can effieciently
store bytes. The reason to use say, Utf8String rather than ByteArray
is precisely *because* Strings are conceptually made of Characters.
Encapsulation and all that.
> A Text defines attributes for Character runs in a String. Instead of
> storing the tag in each Character, it could be stored in an
> attribute of the Text. Instead of passing around bare Strings you
> would pass around Text objects (if you need to preserve language
> tags).
Sounds good.
Colin
More information about the Squeak-dev
mailing list
|