[Seaside-dev] Encoding problems with binary vs ascii

Mon Jun 15 20:55:43 UTC 2009

On Mon, Jun 15, 2009 at 12:46 PM, Michael
Lucas-Smith<mlucas-smith at cincom.com> wrote:
> Hi All,
>
> I'm having some trouble with the new WACodec behavior.
>
> The tests assume there'll be a #converter selector on the WACodec subclass
> (testCodecLatin1). This seems a bit heavy handed, if you want to let the
> platforms decide how to achieve their conversion.

Seems a bit strange since the method is marked 'private', yup. I'll
let Philippe comment.

> Binary conversions are still an issue - take the following code:
>
> | codec binary encoder |
>   codec := WACodec forEncoding: 'utf-8'.
>   binary := self utf8String asByteArray.
>   encoder := codec encoderFor: (WriteStream on: String new).
>   encoder binary.
>   encoder nextPutAll: binary.
>
> The encoder is initialized with a non-binary write stream, then it's told to
> become binary. You can't do that - the encoder has no way of knowing what's
> inside its inner stream, nor should it. If you intend to put bytes in to the
> stream, start it with a ByteArray.

I'll let Lukas comment on the specific design but I think the problem
is that the creator of the Response (the ServerAdaptor) does not know
whether the request handler wants to return binary data or not. Or
perhaps I'm missing your point.

> Likewise, if you're going to the effort of fixing up encoding issues at this
> point, why not get rid of all senders of #binary completely?
>
> From what I've understood, the API is "encoding in, encoding out" which
> means you expect to go from strings to strings. This is okay, I guess,
> except that I'd also like to be able to go only half way.. put strings in
> and get bytes out, this would remove any unnecessary conversions taking
> place.

I don't think they necessarily *have* to use strings internally; it
would depend on your codec as far as I recall the discussion. There's
definitely still room for discussion here but be aware it's been
discussed lots before and there doesn't ever seem to be an easy
answer. :)

> I've heard plenty of times before that this can't be done because of various
> different levels of support.. but you're already pushing the boundaries of
> what can be done "out of the box" with WACodec, so why not go the whole way
> and do it right? Strings<->ByteArray conversions only?

I'm not sure what you're suggesting - having two Codecs in play at a
time? If so, I think we're assuming that most platforms already have
such an object and a Codec can be implemented to use two of those
objects if needed to perform the conversion. But again, I'm not clear
which side you're saying should have ByteArrays and so on.

> Next, the WACodec expects to implement #name which will return the name that
> was used to create it.. I clarify, the tests assume that that is the
> behavior. If that's the expected behavior, there's no reason why the
> subclasses of WACodec need to implement that particular behavior, as it can
> never change.

Not sure about this.

> Finally, I don't entirely understand the motivation of WACodec>>url... As
> far as I knew, there's no situation where URL-encoded strings encoded as
> shift-jis is going to work. URL-encoding is just another codec in my mind.
> What gives with this API?

Because there is no standard way for the browser to specify the
encoding of URLs, different encodings expect different URL encodings
by default. In general, if the request encoding is latin1, the URL
should be latin1 encoded. If the page encoding is most other things,
the URL encoding is probably UTF-8 but might not be depending on the
situation. Having this there allows a codec to be written to handle
whatever situation you expect. At least, that's my understanding.

It's easy to start flailing in this stuff when you haven't looked at
it in a few months. I'm sure others will correct me where I've messed
up.

Julian