UTF8 Squeak
Colin Putney
cputney at wiresong.ca
Fri Jun 8 22:02:05 UTC 2007
On Jun 7, 2007, at 11:55 PM, Andreas Raab wrote:
> How about trying to improve the speed of conversions? You seem to
> imply that this is the major issue here, so if the conversions
> where blindingly fast (which I think they easily could by writing
> one or two primitives) this should improve matters.
The conversions could be made faster, yes. But consider this: the
life-cycle of a string in a web app is very often something like this:
- comes in over HTTP
- lives in the image for a while, maybe persisted in some way
- gets sent back out over HTTP many times
Even if the conversion *is* blindingly fast, it's still better to
leave it as UTF-8 the whole time, not only to remove the overhead of
decoding and reencoding, but also to avoid storing WideStrings in the
image for long periods of time. Also, consider that building html
pages mainly involves writing lots of short strings to streams, which
sometimes include non-ASCII characters. If they can be pre-encoded
it's another space and time win. On the other hand, the traditional
drawback to UTF-8, random access to characters, doesn't come up much
with generating web pages, though of course a web app may do this
kind of thing as part of its domain functionality.
I don't claim that all strings should always be UTF-8, but having a
UTF8String class would be an excellent thing.
Colin
More information about the Squeak-dev
mailing list
|