[Seaside] File upload - encoding issue
Sven Van Caekenberghe
sven at stfx.eu
Fri Oct 17 09:25:54 UTC 2014
Hi Philippe, Dave,
I made a couple of changes to Zinc to handle the problem (which basically is: mime parts such as uploaded files embedded in multipart/form-data do not have a charset parameter on their mime types, hence the encoding is not known with absolute certainty) and I think I fixed it (for Zn itself, the default encoding now is UTF-8). I added a specific test (ZnServerTests>>#testFormTest3Unspecified) for this case. Additionally, the filename is now also assumed to be UTF-8 encoded (like a file path).
For the Zn Seaside adaptor, the story was a bit different. The adaptor uses a special Zn option to read everything binary, as Seaside wants to do its own conversions. That option did not extend to mime parts in multipart/form-data. This is now added and the adaptor now works, without altering ZnZincServerAdaptor>>#convertMultipartFileField:
IMHO though, WAUploadFunctionTest is wrong. Basically, the use of ISO-8859-1 is questionable and should be replaced with UTF-8 for current browsers (in the methods #renderDownloadLinksOn: and #renderFileContentsOn:). Then those tests pass for uploaded text files that have non-ascii contents.
The comment in #renderDownloadLinksOn: suggests that this problem (as described in the 1st paragraph) was noted before, the solution or fallback is wrong though, IMHO.
The codec set in the adaptor could indeed be a fallback. I don't know if this can be accessed in regular Seaside code (like in the functional test).
On the other hand, I can't see (and would love an example) where it makes sense, in the 21st century, to not use UTF-8 as a fallback (in case nothing was specified).
In any case, thanks for raising this issue, it helped to improve the code.
PS: BTW, are there no unit tests that actually stress the functional tests ?
On 09 Oct 2014, at 20:31, Philippe Marschall <philippe.marschall at gmail.com> wrote:
> On Thu, Oct 9, 2014 at 9:30 AM, Sven Van Caekenberghe <sven at stfx.eu> wrote:
>> On 09 Oct 2014, at 08:46, Dave <lasmiste at gmail.com> wrote:
>>> Sven Van Caekenberghe-2 wrote
>>>>> Do you have information in the request header that suggests UTF-8?
>>>> Not that I can see, there are no charset=utf-8 anywhere (but one could
>>>> assume they are the default):
>>> Right, I also can't find where utf-8 is set. Any idea on how can I change
>>> the charset?
>> Well, there is an accept-charset="utf-8" in the form, but it does not appear in the submitted form (I only checked one browser). Like I said, I need an informed opinion to help me make a decision here.
> The codec on the server adaptor should do the trick. It should match
> the page encoding and the accept-charset. Seaside always sets them to
> the same value, I did not test which takes precedence in which
> browser. I did a quick test and could verify it with UTF-8 and
> ISO-8859-1 on Firefox. You can either use the codec on the server
> adaptor or ask the codec for the name and do it with the Zinc
> Weird things happen in ISO-8859-1 when using code points that do not
> fit. Eg Mac OS X uses NFD so German umlauts are two code points with
> the second one outside of ISO-8859-1. I did not test UTF-16 or
> seaside mailing list
> seaside at lists.squeakfoundation.org
More information about the seaside