I meant to reply to everyone but shaky hands pressed the wrong button...
---------- Forwarded message ---------- From: Panu Suominen panu.j.m.suominen@gmail.com Date: 2010/8/27 Subject: Re: [SqueakDBX] Character encoding and pooling in SqueakDBX and Glorp To: Mariano Martinez Peck marianopeck@gmail.com
Thank you for excelent feedback.
2010/8/26 Mariano Martinez Peck marianopeck@gmail.com:
- I guess a lot of people doesn't need UTF (or other) encoding. And I guess
that the transformation brings some overhead in the query result mosty...(I should run the benchmarks and see if there is difference). But I was thinking if we could make this optional.
I was thinking that too. Apparently no encoding means that only 7bit characters should be used?
or something like that....but the idea of making it optional not to have overhead if not required, and being able to be enabled from the app code.
This probably need some thinking. Because if one uses database through GLORP the encoding parameter should be passed somehow to lower levels. Would the ConnectionSettings in SqueakDBX level be right place? And in Glorp I think the platform level seems to be good. Then in Glorp one could express (Login new) database: (PostgreSQLPlatform withEncoding: #utf8); username: 'yyyyy'; password: 'xxxx'; connectString: 'host' , '_' , 'database'
(PostgreSQLPlatform withEncoding: #utf8) would be short cut for (PostgreSQLPlatform new encoding: #utf8).
maybe we can just set again the NullEncoder in that case? or raise an error ? what do you think?
Yep, error checking code was not up to the task in the first try. Probably raise an error. Espesially if user has decided to use encoding (and data needs it). If code would continue silently with NullEncoder the data read would be corrupted.
Ok.... I run the benchmarks and I was rigth
If I understood correctly berformance dropped over 10%. It is quite much. Have to see if that could be improved.
In addition, if someone has its own database using a particular encoding, and doesn't implement openConnection (that does the query, like you did in postgres) because he doesn't knwo, this will be a problem beasue the database will have an encoding X and you will be encoding always with UTF8 ;)
This is a problem. However I do feel that encoding should be checked when database is connected (at least when user has selected that encoding should be used). Also endocoding should be property of the connection. If it would be class variable it would be impractical to connect to different databases with different encoding.
I think one possibility to takle these problems would be EncoderFactory user passes to the code. This factory would set encoder in connection based either on the database query or by preselected value. The query option would ask the connection about encoding (using message encodingName or something like that). If connection cant handle the message then exception is raised (user chosed to automatically select encoding in setting where code is not up to the task).
The preselected option would just set right encoder to the connection. I think that validation of this choice should be skipped because it would render this option to be basically the same than the automatic option. Preselected encoding could be NullEncoder.
If this sound sensible I will try to implement the changes in the weekend but cant promise I have the time. :)
-- Panu
squeakdbx@lists.squeakfoundation.org