[Seaside-dev] WACodecTest>>testCodecUtf8ShortestForm

Mon Jun 29 19:21:34 UTC 2009

2009/6/29 Michael Lucas-Smith <mlucas-smith at cincom.com>:
> Philippe Marschall wrote:
>....
> But this attack is based on the idea that you would attempt to filter
> certain words -before- you've decoded the UTF8.
> That's insane. Period.

It could also mean somebody at some point made a mistake like forgot
to decode something when he should have and somehow later some class
that tries to be helpful fixes it up. That's not beyond imagination.

> I acknowledge the idea that it'd be nice to protect
> our users from themselves.. hah. The post also mixes up illegal sequences
> with non-shortest form - which the spec goes to pains to differentiate in
> its verbiage.
>
> May be Java has decided that users don't want to decode UTF8 and therefore
> it's a security risk, but I don't think that's necessarily the right thing
> for us to do in Smalltalk.
>
> You won't get this kind of attack using Opentalk-HTTP ...unless you're using
> Seaside with a WANullCodec. It's therefore possible to get this attack with
> Seaside, but only if you're using WANullCodec - which from what I gather is
> what every body is using. However, it is also the intent to move off of
> WANullCodec ...so crippling an otherwise correct UTF8 decoder to satisfy
> WANullCodec would bt the wrong thing to do.

If the server has no bugs which might well be the case Opentalk.
However some other server or implementation could have bugs (I
wouldn't be surprised if my code has). I see it more as a safety net
that is there if some other safety net breaks. It's not a big deal if
the test isn't green, it's an expected failure on Squeak and probably
will stay so for the foreseeable future.

> I'm all for rejecting the illegal sequences, but the spec is pretty specific
> about non-shortest forms being parsable... and since when did we start
> looking to Java for "the right thing to do" ? ;)

They pretty much trash us when it comes to Unicode. And they have a
stream hierarchy that's based on decoration, does a clear separation
between character oriented and byte oriented IO (which compiler
checks) in fact even between I and O. If I compare that with Squeak,
well how does MultiByteBinaryOrTextStream sound?

Cheers
Philippe