[Seaside] 3.9 and encoding
philippe.marschall at gmail.com
Wed Feb 28 08:47:54 UTC 2007
2007/2/28, Todd Blanchard <tblanchard at mac.com>:
> I took a quick look at the request processing and I don't see where utf-8
> stuff gets decoded. AFAICS, it just doesn't do it - thus producing a one
> byte to a character transformation, but maybe I'm missing something.
#unescapePercents does utf-8 decoding.
> I have done a LOT of this stuff (formerly chief architect at a web I18N
> company). There are a few things that are not so intuitive when dealing
> with encodings and http requests.
> Escape sequences escape bytes, not characters.
> On pass 1, you assume you have latin-1, parse the header and get the
> content-type and associated charset. Remember this for later translation.
We don't do that. We assume either you are running utf-8 or you don't
want any translation taking place.
> Build a byte array from the string by putting ascii characters in as bytes.
> Decode escape sequences into single bytes as you go.
> Convert the byte array to a string by reading bytes and composing them into
> code points according to the encoding specified as the charset in the
> content-type. For utf-8 this means reading a byte, checking the high order
> bits to find out the length of the byte sequence, then reading the rest of
> the sequence, composing the code point, etc...
> Now you have text - start over and parse as normal.
> Some of these steps can be folded but conceptually, this is how it works.
> So I don't think WAKomEncoding39 is doing the right thing wrt to request
> processing AFAICS.
> -Todd Blanchard
> On Feb 27, 2007, at 3:26 PM, Philippe Marschall wrote:
> If you run WAKomEncoded39 on Squeak 3.9 you will have strings with
> (new) Squeak encoding in your image which is basically non-unified
> unicode. For latin-1 characters this will be indistinguishable from
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
More information about the Seaside