[Seaside-dev] Encoding problems with binary vs ascii

Tue Jun 16 05:15:54 UTC 2009

2009/6/15 Michael Lucas-Smith <mlucas-smith at cincom.com>:
> Hi All,
>
> I'm having some trouble with the new WACodec behavior.
>
> The tests assume there'll be a #converter selector on the WACodec subclass
> (testCodecLatin1). This seems a bit heavy handed, if you want to let the
> platforms decide how to achieve their conversion.

Fixed.

> Binary conversions are still an issue - take the following code:
>
> | codec binary encoder |
>   codec := WACodec forEncoding: 'utf-8'.
>   binary := self utf8String asByteArray.
>   encoder := codec encoderFor: (WriteStream on: String new).
>   encoder binary.
>   encoder nextPutAll: binary.
>
> The encoder is initialized with a non-binary write stream, then it's told to
> become binary. You can't do that - the encoder has no way of knowing what's
> inside its inner stream, nor should it. If you intend to put bytes in to the
> stream, start it with a ByteArray.
>
> Likewise, if you're going to the effort of fixing up encoding issues at this
> point, why not get rid of all senders of #binary completely?
>
> From what I've understood, the API is "encoding in, encoding out" which
> means you expect to go from strings to strings. This is okay, I guess,
> except that I'd also like to be able to go only half way.. put strings in
> and get bytes out, this would remove any unnecessary conversions taking
> place.
>
> I've heard plenty of times before that this can't be done because of various
> different levels of support.. but you're already pushing the boundaries of
> what can be done "out of the box" with WACodec, so why not go the whole way
> and do it right? Strings<->ByteArray conversions only?
>
> Next, the WACodec expects to implement #name which will return the name that
> was used to create it.. I clarify, the tests assume that that is the
> behavior. If that's the expected behavior, there's no reason why the
> subclasses of WACodec need to implement that particular behavior, as it can
> never change.

That's driven by a different use case. When you set up a server
adapter you might want to have a dropdown of all supported codecs.
Therefore WACodec class >> #allCodecs answers all available codes
without known their name and #name answers it for pure display
reasons.

> Finally, I don't entirely understand the motivation of WACodec>>url... As
> far as I knew, there's no situation where URL-encoded strings encoded as
> shift-jis is going to work. URL-encoding is just another codec in my mind.
> What gives with this API?

Julian summarized it well. See comment six on this issue [1] and why
useBodyEncodingForUI was deprecated in Tomcat. Oh, and IE 5 uses
latin-1 as an URI encoding on utf-8 pages.

 [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=23929

Cheers
Philippe