UTF8 Squeak
stephane ducasse
stephane.ducasse at free.fr
Sat Jun 9 07:24:39 UTC 2007
Colin
Could you say the difference between WidString and UTF-8 (UTF-8 would
a specialized WideString?).
I got bitten by these encodings problems and having a nice solution
would be good.
Stef
On 9 juin 07, at 00:02, Colin Putney wrote:
>
> On Jun 7, 2007, at 11:55 PM, Andreas Raab wrote:
>
>> How about trying to improve the speed of conversions? You seem to
>> imply that this is the major issue here, so if the conversions
>> where blindingly fast (which I think they easily could by writing
>> one or two primitives) this should improve matters.
>
> The conversions could be made faster, yes. But consider this: the
> life-cycle of a string in a web app is very often something like this:
>
> - comes in over HTTP
> - lives in the image for a while, maybe persisted in some way
> - gets sent back out over HTTP many times
>
> Even if the conversion *is* blindingly fast, it's still better to
> leave it as UTF-8 the whole time, not only to remove the overhead
> of decoding and reencoding, but also to avoid storing WideStrings
> in the image for long periods of time. Also, consider that building
> html pages mainly involves writing lots of short strings to
> streams, which sometimes include non-ASCII characters. If they can
> be pre-encoded it's another space and time win. On the other hand,
> the traditional drawback to UTF-8, random access to characters,
> doesn't come up much with generating web pages, though of course a
> web app may do this kind of thing as part of its domain functionality.
>
> I don't claim that all strings should always be UTF-8, but having a
> UTF8String class would be an excellent thing.
>
> Colin
>
>
More information about the Squeak-dev
mailing list
|