[Seaside] character encoding / was: (Postgres / Glorp / Kom)

Philippe Marschall philippe.marschall at gmail.com
Tue May 15 08:24:09 UTC 2007


2007/5/15, Ramiro Diaz Trepat <ramirodt at gmail.com>:
> Summarizing
>
> Using Squeak 3.9, for example:
>
> 1.
> - Start Seaside with WAKom.
> - Go to the SushiStore, search for 'Ñandú' (a kind of Argentinean ostrich).
> - The method WAStoreFillCart>>search: receives a properly formed ByteString
> that reads 'Ñandú'
> - Seaside then displays the corrupt String: No items match '?amd?'
>
> 2.
> - Start Seaside with WAKomEncoded39.
> - Go to the SushiStore, search for 'Ñandú' .
> - The method WAStoreFillCart search: receives a properly formed ByteString
> (not a WideString or an UTF8 formatted ByteString) that reads 'Ñandú'

Because the character codes are smaller than 256. Sorry I wasn't
explicit enought about this, you get a WideString as soon as you have
a character with a code point of 256 or bigger. For example Korean
form the uft8 sampler [1]. This is btw all explained in the commoents
of (Wide)Character and (Wide)String. Its correct Squeak encoding,
displays more or less correctly in the inspector (would probably be
much nicer with FreeType) and #size answers 5. This would all not be
the case if you had utf8 strings. The do not display correctly (for
non-ascii strings) and their #size is too big (for non-ascii strings).

So this test works. It was just my fault of not explaining Squeak
encoding of Strings in all details.

[1] http://www.columbia.edu/kermit/utf8.html

Cheers
Philippe

> - Seaside then displays the correct String: No items match Ñandú
>
> In spite that example 2 properly displays the string, methods like #search:
> never seem to receive an UTF8 or WideString instance.  What you get either
> with WAKom or with WAKomEncoded39 are always indistinguishable instances of
> ByteString.
> WAKomEncoded39 encodes strings before sending and after receiving to UTF8,
> but you don't get to "see" these UTF8 Strings. When they get to you, they
> are always converted to Squeak´s default encoding (which I don't know what
> it is yet) ?
>
>
>
>
> Summarizing some of the answers I got.
>
> Philippe
> Basically informs us that the handling of UTF8 strings in KomHttpServer /
> Squeak 3.9 got really broken, and that sadly the fix seems not to be on the
> way anytime soon.  But also says that everything works in 3.8.
> In spite of this affirmation, I got no concrete answers from anyone using
> Seaside in production (and using special characters) about which platform
> are they using.  In particular, I didn't hear from the rest of the Seaside
> core developers "We are all using Squeak 3.8" nor I have the fix for
> KomHttpServer for 3.9 but I will not share it :)
>
> Norbert
> Being in a very similar context than me, that is having to use a Postgres DB
> encoded in UTF8,  was also unable to make it work out of the box (confirming
> Philippe's statements) and coded a very smart work around, that he kindly
> shared with us.
>
> Sebastián
> Everything works for him using WAKomEncoded39.  But probably as in the
> SushStore examples above with WAKomEncoded39.  That is, not receiving UTF8
> or WideStrings.
>
>
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>


More information about the seaside mailing list