[Seaside] 3.9 and encoding

Wed Feb 28 08:47:54 UTC 2007

2007/2/28, Todd Blanchard <tblanchard at mac.com>:
> I took a quick look at the request processing and I don't see where utf-8
> stuff gets decoded.  AFAICS, it just doesn't do it - thus producing a one
> byte to a character transformation, but maybe I'm missing something.

#unescapePercents does utf-8 decoding.

> I have done a LOT of this stuff (formerly chief architect at a web I18N
> company).  There are a few things that are not so intuitive when dealing
> with encodings and http requests.
>
> Escape sequences escape bytes, not characters.
>
> On pass 1, you assume you have latin-1, parse the header and get the
> content-type and associated charset.  Remember this for later translation.

We don't do that. We assume either you are running utf-8 or you don't
want any translation taking place.

> Build a byte array from the string by putting ascii characters in as bytes.
> Decode escape sequences into single bytes as you go.
>
> Convert the byte array to a string by reading bytes and composing them into
> code points according to the encoding specified as the charset in the
> content-type.  For utf-8 this means reading a byte, checking the high order
> bits to find out the length of the byte sequence, then reading the rest of
> the sequence, composing the code point, etc...
>
> Now you have text - start over and parse as normal.
>
> Some of these steps can be folded but conceptually, this is how it works.
>
> So I don't think WAKomEncoding39 is doing the right thing wrt to request
> processing AFAICS.
>
> -Todd Blanchard
>
>
> On Feb 27, 2007, at 3:26 PM, Philippe Marschall wrote:
>
>
> If you run WAKomEncoded39 on Squeak 3.9 you will have strings with
>
> (new) Squeak encoding in your image which is basically non-unified
>
> unicode. For latin-1 characters this will be indistinguishable from
>
> latin-1.
>
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>