[Seaside] 3.9 and encoding

Philippe Marschall philippe.marschall at gmail.com
Wed Feb 28 13:47:53 UTC 2007


2007/2/28, Norbert Hartl <norbert at hartl.name>:
> On Wed, 2007-02-28 at 10:03 +0100, Philippe Marschall wrote:
> > 2007/2/28, Norbert Hartl <norbert at hartl.name>:
> > > On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote:
> > > > 2007/2/28, Norbert Hartl <norbert at hartl.name>:
> > > > > Hi,
> > > > >
> > > > > I ran into a encoding problem. I'm using seaside together
> > > > > with Glorp. For the web server I use WAKomEncoded39.
> > > > > WAKomEncoded39 converts the output to the browser to utf-8.
> > > > > But on incoming requests the url escaped characters are
> > > > > translated to something different. For me it appears to
> > > > > be latin-1 but I've no glue why it should be that way.
> > > > > I detected it because my postgresql session has client
> > > > > encoding utf-8 turned on and I get an error trying to
> > > > > store strings containing characters like ö.
> > > >
> > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with
> > > > (new) Squeak encoding in your image which is basically non-unified
> > > > unicode. For latin-1 characters this will be indistinguishable from
> > > > latin-1. If your database is utf-8 you need to encode your strings to
> > > > utf-8 when writing them to your database and decode your strings from
> > > > utf-8 when reading from the database (only to convert it back to utf-8
> > > > when generating html). You can configure the PostgreS database driver
> > > > to do this automatically for you.
> > > >
> > > Oh, this seems quite easy. But I didn't found anything to configure
> > > in the Postgres driver. Do you have any hint?
> >
> > PGConnection >> class #buildDefaultFieldConverters
> > TestPGConnection >> #testFieldConverter
> >
> > You need to register a field converter for your string types that does
> > #convertFromEncoding: #utf8
> >
> This way it is working already. I think as long as no one is touching
> the string it comes as utf-8 from the database und gets encoded a
> second time by WAKomEncoded39 which has no effect.
>
> > Sorry that does only do the decoding and not the encoding. I guess in
> > your case Glorp does the encoding. I don't know how you can customize
> > the Sql generation there but it everything else fails you can change
> > PGConnection >> #execute (yes, this is a hack)
> >
> I don't think Glorp does encoding and I think it shouldn't.
> Glorp should be happy with strings. If there is conversion happening
> it should happen in the postgres driver (it is the only one who
> could know which encoding is needed for the database).
>
> My strings are carried by ByteString. It seems that ByteString (got
> from WAKomEncoded39) contains a bunch of bytes with any encoding (
> ok, it is the non-unified unicode, you said, and i don't know what
> that means :) ).
> I can convert it with convertToEncoding: to another encoding still
> using ByteString. But there is no information about encoding in the
> object. I think this is really dangerous. I have to look at WideString.
> I'm curious how those deal with encodings they are created from.
>
> I think there are only two possibilities. Handle it like Java, Lisp
> and convert every encoding to the internal (UCS-2) on string creation.
> The other option which would be easier (i think) is to add the
> character encoding information into the string.
>
> What do you think?

Strings are a hard problem. It's interesting to see how many languages
fuck up in this area considering this is a 'basic' data type. Having
more information about a String (what encoding, what escaping, ..)
would definitely help.

UCS-2 is not a "the solution" since it handles only characters in the
BMP. Additionally we don't want to do Han unification.

> > sql := sqlString.
> > to
> > sql := sqlString convertToEncoding: #utf8.
> >
> The hack is actually adding the conversion to
> SqueakDatabaseAccessor>>basicExecuteSQLString:
>
>
> I understand a lot more now. Thanks very much.
>
> Norbert
> > P.S.:
> > PGConnection >> class #buildDefaultFieldConverters
> > has given us a lot of pain because Squeak doesn't have full block closures
> >
> Oh, wow, another day hearing a lot of basic things I don't have any idea
> about :) What are "full" block closures?

The problem is that all these :s block arguments are all sharing the
same temporary variable. If multiple of these are activated at the
same time, you have a problem.

See:
http://bugs.impara.de/view.php?id=4636

Philippe

> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>


More information about the Seaside mailing list