[Seaside-dev] encoding/decoding Streams

Thu Jul 31 10:46:36 UTC 2008

2008/7/31 Paolo Bonzini <bonzini at gnu.org>:
>>> 1) Would it make sense to change WACodec>>#encode: and WACodec>>#decode:
>>> to
>>> > accept streams instead of Strings?
>>
>> The servers I had to do with deal with Strings and not Streams.
>
> Swazoo supports feeding data to the socket directly for a stream, but the
> Seaside adaptor cannot use the feature because it has to convert the
> WAResponse's contents to a String just to feed it to the codec:
> ...
> (It's a little more complicated because you'd need the SwazooRequest to make
> the SwazooResponse, using "swazooRequest streamedResponse", but not a big
> deal).

There are two kinds of streaming
- streaming of the response after it has been created
- streaming of the response while it is been created

The second is a big deal. You better get familiar with:
http://www.google.ch/search?q=illegalstateexception+response+already+committed

Making WACodec stream based would address only the first one. If the
current conversion from string stream to string and back has a
noticeable performance impact then we should change it.

> Also, I don't know about Squeak but in both VW and GNU Smalltalk reencoding
> is Stream-based, and it to reencode a String you just have to wrap it again
> in a Stream.

Squeak is String based. But you don't have to use WACodec as a base
for your server adapter. You can write your own stream based one.
You'll only have to use WACodec for url encoding. There the argument
is a string. Arguably the interface of WACodec is a bit too big for
just that.

>>> 2) The tests for codecs assume that the source encoding is ISO-8859-1.
>>
>> I don't see what you mean here.
>
> The .mcz files downloaded from SqueakSource pass an ISO-8859-1 string to
> #encode:, and check that the result is in whatever encoding was passed to
> #newForEncoding:.

No, it passes a Smalltalk string to #encode: and assumes the file in /
out mechanism is working.

The assumption it makes is:

| s |
s := 'é'.
self assert: s size = 1.
self assert: (s at: 1) = $é

Which does not sound unreasonable to me.

>>>  This
>>> is fine for testing purposes, but not necessarily for deployment. Would
>>> it
>>> make sense then to change the constructor so that it can take a source
>>> and
>>> destination encoding?  Or is there something I'm missing?
>>
>> I assume the current tests cause a problem somewhere. Can you elaborate?
>
> No, the tests are not a problem.  But in order to pass them, I had to
> hardcode ISO-8859-1 as the source encoding in GNU Smalltalk's default codec
> (see again how #encode: is implemented, above).

If the source files are ISO-8859-1 then you better treat them as
ISO-8859-1. If you have no way of figuring out what the encoding of
your source files is, then sooner or later it will break.

Cheers
Philippe