[Seaside-dev] Re: encoded stream

Sun Feb 1 19:24:32 UTC 2009

On Thu, Jan 29, 2009 at 7:56 PM, Philippe Marschall
<philippe.marschall at gmail.com> wrote:
> I think there are several different issues:
> 1. Should we make the configuration and their consequences for explicit?
> 2. What should the configuration options look like?
> 3. Should we provide some kind of accessor to what we think the
> internal encoding is?
> 4. Should try to detect if incoming data is in the expected encoding?
> 5. Should we support some non-Smalltalk encoding internally that is
> different from the external encoding?
> 6. Where should be encoding be specified?
>
> 1. Yes, "encoded server adapter" is not the best name ever.
> 2. Dunno, see above.
> 3. Sure, why not, for example:
> context usesSmalltalkEncoding
>   ifTrue: [ 'Smalltalk' ]
>   ifFalse: [ server encoding ]

Is there a reason we are calling it "Smalltalk"? Is that an accepted
name for this? I can't even figure out what Smalltalk encoding is
except that in Squeak is seems to be pretty close to Unicode but some
characters don't seem to match up. Do we really just mean "native"
encoding here?

> 4. No, the browser does not send in what encoding the data is. I know
> because all but the very latest version of Kom ("Dolphin & Monkey") do
> blow up if the browser sends the encoding. Additionally we have no way
> of finding out in what encoding incoming data is by just looking at
> it.

That's not strictly true:
 * You told me that Opera does include the encoding
 * It is possible in many cases to detect the encoding. PHP does this.
Or look at http://chardet.feedparser.org/ for example. So it *could*
be done but...
 * If browsers are supposed to submit data in the same encoding as the
page the form was on (or the character set specified on the form),
then we could record that information in the SessionContinuation so
that when the callbacks were triggered we would know exactly what
encoding was used for the page that generated that callback.

> 5. No, until we actually have somebody who needs this I consider this
> a purely theoretical use case.

My point isn't that we need to support conversions to arbitrary
encodings. My point is that it is easier to understand what is
happening when you know what the two encodings *are*. We don't have to
allow (for the moment) anything but UTF-8 for the external encoding
when UTF-8 is specified as the internal encoding. I have no problem
with that limitation for now.

All I'm saying is if you know your options are "utf-8/native",
"latin-1/native", or "utf-8/utf-8" then it is perfectly clear what is
being sent to the browser *and* what you are supposed to be dealing
with in your image.

> 6. Lukas made a pretty good argument for doing it in the server adapter.

Well, I actually don't think it does make sense on the server adaptor
in the long term. There is little reason to believe that all
applications would necessarily be using the same encoding, let alone
every request. JSP seems to provide setCharacterEncoding() on both
their Request and Response objects and I'd rather see us do something
along those lines (with a default specified per-application).

That said, I think it may be overkill to be starting on this now. I
suggest Lukas finish his changes to have the Response encode on the
fly using the codec specified in the server adaptor. We can revisit
this again for the next release.

It would be nice if we could find a way to address the confusion a bit
in this release simply by improving naming or the way we present the
encodings (as I suggest in reply to your #5, for example). If we can
do this without changing the architecture, this would help pave the
way for further improvements in a later release and not slow this one
down any further.

What do you think?

Julian