[Seaside-dev] WACodecTest>>testCodecUtf8ShortestForm

Mon Jun 29 21:39:11 UTC 2009

2009/6/29 Michael Lucas-Smith <mlucas-smith at cincom.com>:
> Philippe Marschall wrote:
>>
>> 2009/6/29 Michael Lucas-Smith <mlucas-smith at cincom.com>:
>>
>>>
>>> Philippe Marschall wrote:
>>> ....
>>> But this attack is based on the idea that you would attempt to filter
>>> certain words -before- you've decoded the UTF8.
>>> That's insane. Period.
>>>
>>
>> It could also mean somebody at some point made a mistake like forgot
>> to decode something when he should have and somehow later some class
>> that tries to be helpful fixes it up. That's not beyond imagination.
>>
>>
>>>
>>> I acknowledge the idea that it'd be nice to protect
>>> our users from themselves.. hah. The post also mixes up illegal sequences
>>> with non-shortest form - which the spec goes to pains to differentiate in
>>> its verbiage.
>>>
>>> May be Java has decided that users don't want to decode UTF8 and
>>> therefore
>>> it's a security risk, but I don't think that's necessarily the right
>>> thing
>>> for us to do in Smalltalk.
>>>
>>> You won't get this kind of attack using Opentalk-HTTP ...unless you're
>>> using
>>> Seaside with a WANullCodec. It's therefore possible to get this attack
>>> with
>>> Seaside, but only if you're using WANullCodec - which from what I gather
>>> is
>>> what every body is using. However, it is also the intent to move off of
>>> WANullCodec ...so crippling an otherwise correct UTF8 decoder to satisfy
>>> WANullCodec would bt the wrong thing to do.
>>>
>>
>> If the server has no bugs which might well be the case Opentalk.
>> However some other server or implementation could have bugs (I
>> wouldn't be surprised if my code has). I see it more as a safety net
>> that is there if some other safety net breaks. It's not a big deal if
>> the test isn't green, it's an expected failure on Squeak and probably
>> will stay so for the foreseeable future.
>>
>>
>>>
>>> I'm all for rejecting the illegal sequences, but the spec is pretty
>>> specific
>>> about non-shortest forms being parsable... and since when did we start
>>> looking to Java for "the right thing to do" ? ;)
>>>
>>
>> They pretty much trash us when it comes to Unicode. And they have a
>> stream hierarchy that's based on decoration, does a clear separation
>> between character oriented and byte oriented IO (which compiler
>> checks) in fact even between I and O. If I compare that with Squeak,
>> well how does MultiByteBinaryOrTextStream sound?
>>
>>
>
> If they have byte arrays for encoded utf8 characters, then they shouldn't
> have the scenario described in the link.. ever.

They don't, of course.

> As far as I understood it, the only real change to support this is to
> require the adaptors to expect bytes to come out of a Seaside handler. If
> you think I'm being unreasonable, you should have a chat with our Opentalk
> engineers who really dislike how Seaside tries to do more of HTTP than they
> believe it should - such as encoding at all.

We have heard zilch from Opentalk engineers in the last two years. If
you don't want to be part of the community that's fine but then you'll
have to deal with what we do the way we do it. Open source works this
way.

> We can't please everyone, I grok that, that's fine - but breaking the UTF8
> parser because we have issues with how bytes are stored in the smalltalk
> image is just fixing the wrong thing in the wrong place IMHO.

We don't ask you to break anything. If the test is not green that's
fine. It's not on Squeak and probably never will be.

Cheers
Philippe