UTF8 Squeak
Colin Putney
cputney at wiresong.ca
Sun Jun 10 16:54:55 UTC 2007
On Jun 10, 2007, at 3:55 AM, Janko Mivšek wrote:
> I think that this way we can achieve most efficient yet fast
> support for all languages on that world. Because of fixed length
> those strings are also easy to manipulate contrary to variable
> length UTF-8 ones.
"Most efficient yet fast" is a matter of perspective. For the apps I
work on, UTF-8 is better than your scheme because space efficiency is
more important than random access, and time spent encoding and
decoding UTF-8 would dwarf time spent scanning for random access.
As soon as you try to support more than 256 characters, there are
trade-offs to be made. The "ideal" solution depends on your
application. How important is memory efficiency vs. space efficiency?
How about stream processing vs random access? What format is your
input and output? Which characters do you need to support, and how
many of them are there?
A good string library will be flexible enough to allow its users to
make those trade-offs according to the needs of the application.
> Conversion to/form UTF-8 could probably also be simpler with help
> of bit arithmetic algorithms, which would be tailored differently
> for each of proposed three string subclasses above.
Yes, a couple of well designed primitives would help quite a bit.
Colin
More information about the Squeak-dev
mailing list
|