[SqueakDBX] Character encoding and pooling in SqueakDBX and Glorp

Fri Aug 27 06:13:12 UTC 2010

I meant to reply to everyone but shaky hands pressed the wrong button...

---------- Forwarded message ----------
From: Panu Suominen <panu.j.m.suominen at gmail.com>
Date: 2010/8/27
Subject: Re: [SqueakDBX] Character encoding and pooling in SqueakDBX and Glorp
To: Mariano Martinez Peck <marianopeck at gmail.com>

Thank you for excelent feedback.

2010/8/26 Mariano Martinez Peck <marianopeck at gmail.com>:
> - I guess a lot of people doesn't need UTF (or other) encoding. And I guess
> that the transformation brings some overhead in the query result mosty...(I
> should run the benchmarks and see if there is difference). But I was
> thinking if we could make this optional.

I was thinking that too. Apparently no encoding means that only 7bit
characters should be used?

> or something like that....but the idea of making it optional not to have
> overhead if not required, and being able to be enabled from the app code.

This probably need some thinking. Because if one uses database through
GLORP the encoding parameter should be passed somehow to lower levels.
Would the ConnectionSettings in SqueakDBX level be right place? And in
Glorp I think the platform level seems to be good. Then in Glorp one
could express
(Login new)
               database: (PostgreSQLPlatform withEncoding: #utf8);
               username: 'yyyyy';
               password: 'xxxx';
               connectString: 'host' , '_' , 'database'

(PostgreSQLPlatform withEncoding: #utf8) would be short cut for
(PostgreSQLPlatform new encoding: #utf8).

> maybe we can just set again the NullEncoder in that case?  or raise an error ?  what do you think?

Yep, error checking code was not up to the task in the first try.
Probably raise an error. Espesially if user has decided to use
encoding (and data needs it). If
code would continue silently with NullEncoder the data read would be corrupted.

> Ok.... I run the benchmarks and I was rigth

If I understood correctly berformance dropped over 10%. It is quite
much. Have to see if that could be improved.

> In addition, if someone has its own database using a particular encoding, and doesn't implement openConnection (that does the query, like you did in postgres) because he doesn't knwo, this
> will be a problem beasue the database will have an encoding X and you will be encoding always with UTF8 ;)

This is a problem. However I do feel that encoding should be checked
when database is connected (at least when user has selected that
encoding should be used).
Also endocoding should be property of the connection. If it would be
class variable it would be impractical to connect to different
databases with different encoding.

I think one possibility to takle these problems would be
EncoderFactory user passes to the code. This factory would set encoder
in connection based either on the database query
or by preselected value. The query option would ask the connection
about encoding (using message encodingName or something like that). If
connection cant
handle the message then exception is raised (user chosed to
automatically select encoding in setting where code is not up to the
task).

The preselected option would just set right encoder to the connection.
I think that validation of this choice should be skipped because it
would render this option to be basically the same than the automatic
option. Preselected encoding could be NullEncoder.

If this sound sensible I will try to implement the changes in the
weekend but cant promise I have the time. :)

--
Panu

-- 
Panu