UTF8 Squeak

stephane ducasse stephane.ducasse at free.fr
Sat Jun 9 07:24:39 UTC 2007


Could you say the difference between WidString and UTF-8 (UTF-8 would  
a specialized WideString?).
I got bitten by these encodings problems and having a nice solution  
would be good.


On 9 juin 07, at 00:02, Colin Putney wrote:

> On Jun 7, 2007, at 11:55 PM, Andreas Raab wrote:
>> How about trying to improve the speed of conversions? You seem to  
>> imply that this is the major issue here, so if the conversions  
>> where blindingly fast (which I think they easily could by writing  
>> one or two primitives) this should improve matters.
> The conversions could be made faster, yes. But consider this: the  
> life-cycle of a string in a web app is very often something like this:
> - comes in over HTTP
> - lives in the image for a while, maybe persisted in some way
> - gets sent back out over HTTP many times
> Even if the conversion *is* blindingly fast, it's still better to  
> leave it as UTF-8 the whole time, not only to remove the overhead  
> of decoding and reencoding, but also to avoid storing WideStrings  
> in the image for long periods of time. Also, consider that building  
> html pages mainly involves writing lots of short strings to  
> streams, which sometimes include non-ASCII characters. If they can  
> be pre-encoded it's another space and time win. On the other hand,  
> the traditional drawback to UTF-8, random access to characters,  
> doesn't come up much with generating web pages, though of course a  
> web app may do this kind of thing as part of its domain functionality.
> I don't claim that all strings should always be UTF-8, but having a  
> UTF8String class would be an excellent thing.
> Colin

More information about the Squeak-dev mailing list