[Seaside] WAUrl class>>#decodePercent:

Mon Sep 2 14:41:44 UTC 2013

On Mon, Sep 2, 2013 at 10:59 AM, jtuchel at objektfabrik.de
<jtuchel at objektfabrik.de> wrote:
> Hi Philippe,
>
> Am 31.08.13 13:48, schrieb Philippe Marschall:
>
>> On Fri, Aug 30, 2013 at 9:27 PM, jtuchel at objektfabrik.de
>> <jtuchel at objektfabrik.de> wrote:
>>>
>>> Okay, this message costs me some courage now ;-)
>>>
>>> Philippe, of course Seaside is decoding. Otherwise I would never have
>>> gotten
>>> an Exception from decodePercent: in the first place. So I am expecting
>>> the
>>> right thing from Seaside and am getting it.
>>>
>>> The real problem in my case is that I am using ISO-8859-15 in my
>>> application. So the result of getting the text field's contents using
>>> val()
>>> is an ISO-8859-15 encoded String.
>>
>> AFAIK that should be UTF-16 for JavaScript.
>
> Well, it seems it is exactly what the html page's charset setting says. In
> my case it is ISO-8859-15.

If you do '€'.charCodeAt(0) I'm quite sure you'll get 8364 and not 164.

> At least the Strings that are coming in to my
> callback carry umlauts in exactly the encoding that I need them.

Because the browser sends it correctly. It's my understanding that
strings in JavaScript always have the same encoding no matter the
encoding of the page.

>>> encodeURI and encodeURIComponent not only escape characters, but also
>>> convert special characters into UTF-8. In my case this were German
>>> umlauts.
>>> So I fell hostage to a side effect of encodeURI and the fact that VA ST
>>> doesn't yet support Unicode and somehow didn't understand this.
>>>
>>> So what I did was not wrong per se, I just ignored the whole UTF-8 thing.
>>> I
>>> should have started my search in that area, because it is not the first
>>> time
>>> AJax and its UTF-8 nature bit me.
>>> For Pharo/Squeak/Gemstone users, this UTF-8 stuff is a non-issue, and
>>> therefor readers of my posts had absolutely no chance to see the forest
>>> between all the trees.
>>
>> Just for completeness' sake you could try to fake it, accept UTF-8 and
>> translate it to ISO-8859-15
>
> Yes, I thought about this possibility, but then I decided against it not
> only for the reasons you mention, but also for performance reasons. In an
> autocompleter that reacts to every single keystroke and builds up a
> hierarchical reperesentation of business objects that are retrieved using
> Glorp and renders them in nested <ul> tags, every en/decoding step makes the
> thing slower, and this is a very central place in the app that makes part of
> its strengths.

There are some tricks to make it quite fast. But I'm not trying to
convince you. The approach you're currently taking seems to be the
best considering the circumstances. I was just listing other options.

>> but then the question is what you do with
>> everything of Unicode that doesn't fit into ISO-8859-15. Also we have
>> to option of running UTF-8 but not decoding it. This makes it possible
>> to run UTF-8 on non-Unicode-capable systems but you have to be very
>> careful especially with the backend.
>
> I invested quite some effort in making the whole application from database
> to web page UTF-free, because VAST doesn't support it very well, other than
> converting back and forth whenever a String enters or leaves VAST. Special
> fun is involved with DB field lengths etc. So as strange as it may sound,
> unicode is not necessarily your friend if there is at least one part of the
> chain that doesn't support it.

Yes

> Luckily, the application is very tightly coupled to legal regulations in
> Germany, so I can quite safely decide to ignore the rest of the world that
> needs characters outside of the ISO-8859-15 character set.

Good

>>
>> You may want to do special testing with €ŠšŽžŒœŸ which are part of
>> ISO-8859-15 but not ISO-8859-1. Also you may want to test what happens
>> when somebody enters non-ISO-8859-15 input [1].
>
> Okay, you are right. I need to test what happens if somebody enters
> non-iso-8859. But I would expect this to be prevented by the web browser if
> I explicitly set the metadata of my html pages and especially form tags…

I would expect the browser to silently convert to UTF-8 ;-)

Cheers
Philippe