[Seaside] [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9 - utf8 internal encoding?

Tue Feb 24 18:56:08 UTC 2009

Lukas wrote:
> However since the internal encoding of Squeak is *not UTF-8* many
> strings will appear scrambled when looking at them using an inspector.
> It works well though as long as you do not perform heavy string
> scrambling, because the strings are sent back as is. If you have
> string literals with foreign characters in your application code you
> need to make sure that these are valid UTF-8 as well. This is very
> efficient, but you need to be aware of the implications.

What happens if squeak is made to use UTF-8 internally? Ie the unix
man page and various postings on squeak-dev/newbies suggest that a
recent squeak VM/image combo started with '-encoding utf8' should work
well as a utf8 image (provided the correct font is supplied, etc).

In such a case, should plain WAKom be used? With no issue wrt to
string operations like #=, #size and #copyFrom:to: ? Or is there still
a need to convert from the incoming utf-8 and squeak's WideString (and
vice versa)?

> WAKomEncoded converts incoming data from UTF-8 to the internal
> encoding of Squeak, as well it converts outgoing data from the
> internal encoding to UTF-8.

The code and comments in #utf8ToSqueak: suggest that this is only true
if squeak uses latin-1 internally (which is does by the default), right?

> Since there all incoming and outgoing data needs to be converted,
> this approach is slightly less efficient.

Has anybody quantified the inefficiency? I'm starting a clean slate
seaside server, so I'd like to pick the optimal configuration...

Michal