[squeak-dev] ByteString vs EncodedString vs ByteArray (was Re:
leadingChar proposal)
Bert Freudenberg
bert at freudenbergs.de
Fri Aug 28 13:49:32 UTC 2009
> At Thu, 27 Aug 2009 22:19:49 -0700,
> Andreas Raab wrote:
>>
>> Yoshiki Ohshima wrote:
>>> One question is the roadmap; I would think ByteStrings will be
>>> retained for a while (or forever) but may be also phased out. And
>>> also it would be nice to tag ByteStrings. The natural order may
>>> be to
>>> try to move on to text attribute approach earlier so that the bare
>>> representation doesn't matter much. How do you think about these
>>> things?
>>
>> Interesting questions. I'm not sure what you mean by "tagging
>> ByteStrings" - generally my opinion is that String/ByteString/
>> WideString
>> have the same reationship that Integer/SmallInteger/LargeInteger
>> have.
>
> With characters in 0..255 range, somebody may want to define
> language tags and put them. It would be nice if we can do that to be
> transparent.
>
> -- Yoshiki
On 28.08.2009, at 15:28, Colin Putney wrote:
> On 28-Aug-09, at 1:09 AM, Bert Freudenberg wrote:
>
>> Wouldn't ByteArrays be a better way to efficiently store arrays of
>> bytes? Strings are conceptually made of Characters, and there are
>> more than 256 of them. E.g. a la Python 3:
>
> So you're proposing that WideString, once it no longer has language
> tags, use its 4 bytes per character to point to Character objects
> rather than encoding the string at all? That would certainly be an
> interesting implementation. It would trade space for speed (of
> certain operations) in the case of CJK and other writing systems
> that involve large numbers of characters, as you'd have a bunch of
> Character objects persisting in the image, rather than just
> ephemerally. For some applications, that's exactly the right design
> choice, no doubt.
I'm not really proposing anything at this point, just widening the
discussion Yoshiki started (cited above for reference).
> On the other hand EncodedString (and subclasses like Utf8String or
> Latin1String) would make a different trade-off, speed (of certain
> operations) for space. Any #variableByteSubclass can effieciently
> store bytes. The reason to use say, Utf8String rather than ByteArray
> is precisely *because* Strings are conceptually made of Characters.
> Encapsulation and all that.
I guess having encoded strings would be nice. OTOH I value simplicity.
Does anybody have experience with the tradeoffs?
- Bert -
More information about the Squeak-dev
mailing list
|