[Seaside] UTF8TextConverter, GRPharoUtf8Codec, GRPharoUtf8CodecStream against GRNullCodec

Norbert Hartl norbert at hartl.name
Thu Oct 13 19:27:48 UTC 2011


Am 13.10.2011 um 18:12 schrieb Marten Feldtmann:

> Hello,
> 
> I have a question about what all these classes do (in a big picture) and how they work together and when they are actually called. I looked into the source code, but I am still having problems of fully understanding.
> 
> When I have an Adapter with GRNullCodec I assume, that all (?) traffic, content (?) goes through the GRNullCodec, but due to the fact, that GRNullCodec does nothing, the traffic/content is not changed.
> 
> What exactly goes through these codec ?
> 
> If I use an adapter with GRPharoUtf8Codec is then the content converted to/from UTF8 ????
> 
> What does this mean to strings (in my application) I render on my pages like in the following command:
> 
> html text: stringInSomeCodePage
> 
> in both cases GRNullCodec and GRPharoUtf8Codec and with texts in specific code pages like Utf8, Latin1 and "true" Unicode (Utf32).
> 
> In my firsts demos I held all my strings in UTF8 and used the GRNullCodec (and everything is ok in the browser side).
> 
> Then I changed to GRPharoUtf8Codec and it seems to me, that I got now an additional UTF8 conversion. Then I switched my application strings back to Latin1 and it was ok again.
> 
> How does this all work with Unicode characters with code points > 255 (and usage of GRPharoUtf8Codec) (in Squeak: WideString)?
> 
> When is a GRPharoUtf8Codec really needed ??
> 
> Perhaps this is a stupid question .... but then I would like to know it :-))
> 
The rule of thumb is that if you create a string inside the image it is a collection of characters answering there asciiValue as unicode code points when being asked. If you get your string from outside the image then you are only safe if you negotiate with the outside world. In a HTTP scenario you should pick the character encoding from the HTTP headers. There is no way of knowing the encoding upfront. In a web environment it is kind of secure to assume to get back the same encoding you've send to the client because they obey as far as I know.  

UTF8TextConverter is pharo specific. GR..Codec... are grease classes which you have to load separately. If you use

(GRCodec forEncoding: 'utf-8') decode:/encode: 

then you get the platform specific encoding class for the platform you are on. Saying this is the cross dialect/platform way of doing. Finally if it comes to encoding you have to do it right at the border of a system where data is exchanged. Only if negotiation about encoding is in place and taken care of it works. In every other case where it is only slightly different it will fail the one or other way.

And btw. UTF32 is no true unicode. Unicode is about the numeric mapping of symbols and particles. UTFXX is the encoding of unicode in an byte order independent way for unicode. These are either space efficient (UTF-8) or performance efficient (UTF-16, UTF-32). 

Norbert


More information about the seaside mailing list