[Seaside-dev] Re: [Seaside] Re: requests and encodings (was Re: fix
for issue 21)
Julian Fitzell
julian at fitzell.ca
Mon Jun 30 02:56:52 UTC 2008
Oops... you're right. I didn't know there was a separate seaside-dev
so missed the distinction you were making.
I've moved this over there now, though I think you've sufficiently
clarified the issue for me at this point anyway.
Julian
On Sun, Jun 29, 2008 at 11:19 PM, Philippe Marschall
<philippe.marschall at gmail.com> wrote:
> 2008/6/29, Julian Fitzell <julian at fitzell.ca>:
>> Hi Philippe,
>>
>> I did semd my previous message to seaside-dev.
>
> All my headers show your message went to
> seaside at lists.squeakfoundation.org and not
> seaside-dev at lists.squeakfoundation.org
>
>> I feel like maybe I've offended you somehow, which was absolutely not
>> my intention. If so, I apologize. As I said, I love the pluggability
>> of this new encoding stuff... it's very clean and well done. My
>> intention was only to fix the bug in issue 21, which I did. The rest
>> was just thinking aloud.
>>
>> My knowledge of the encoding in Squeak must be out of date (I was
>> familiar with it before the internationalization stuff went in). At
>> the time, MacRoman was used and, as I understand it, MacRoman only has
>> 256 characters.
>
> That was the state of Squeak 3.7 to my knowledge. Squeak 3.8 switched
> to layer violated Unicode. So all the MacRoman issues fall away and
> new ones appear like #= and a lot of methods in WideString broke. I
> don't know if everything of this is fixed in Squeak 3.10 but
> WideString had some show stoppers in Squeak 3.8 and 3.9.
>
> Seriously when dealing with Strings we must be sure that they are
> Strings. That is only the case if the String has Smalltalk encoding.
> Else the String is a mere ByteArray. A byte in it has no semantics at
> all. It is not possible to do anything meaningful at all with such an
> abstraction because we can not assume anything about it. So it has the
> byte value 60 in it. Is that $<? We don't know and can't know because
> it has no semantics.
>
> Cheers
> Philippe
>
>> Obviously you want to be dealing with string literals,
>> etc. in squeak's encoding but data coming out of an existing database
>> is going to be in something else and outputting data from such a
>> database is going to be a common case.
>>
>> Assuming my understanding of MacRoman is correct, you obviously can't
>> convert UTF-8 database data to MacRoman, then back to UTF-8 for output
>> back to the browser because the conversion would be lossy. It sounds
>> like you're saying MacRoman is no longer the encoding used. As long as
>> the full character space is available in the native encoding, then I
>> agree that having seaside deliver everything in that native encoding
>> is a reasonable implementation.
>>
>> I don't necessarily agree that being able to specify the encoding of a
>> piece of data is "pure horror" but I agree what is there now is going
>> to be adequate as long as the internal encoding is appropriate for the
>> task. Again, sorry for any offense.
>>
>> Julian
>>
>> On Sun, Jun 29, 2008 at 9:48 PM, Philippe Marschall
>>
>> <philippe.marschall at gmail.com> wrote:
>> > 2008/6/28, Julian Fitzell <jfitzell at gmail.com>:
>> >> Moving to seaside-dev...
>> >>
>> >> On Sat, Jun 28, 2008 at 2:56 PM, Philippe Marschall
>> >> <philippe.marschall at gmail.com> wrote:
>> >> > 2008/6/27, Julian Fitzell <julian at fitzell.ca>:
>> >> >> On Fri, Jun 27, 2008 at 1:07 PM, Philippe Marschall
>> >> >> <philippe.marschall at gmail.com> wrote:
>> >> >> > 2008/6/26, Julian Fitzell <julian at fitzell.ca>:
>> >> >> >> - I wonder whether we should add a "path" instVar to WARequest.
>> >> >> >> Currently the (unfortunately-named) "url" instvar doesn't provide any
>> >> >> >> way to tell the difference between a '/' and a '%2f' in the original
>> >> >> >> URL. I broke my fix up into two methods so that we could store the
>> >> >> >> result of #pathSegmentsFrom: in another instvar.
>> >> >> >
>> >> >> > Ideally IMHO "url" would hold a WAUrl that is the request URL parsed.
>> >> >> > I don't know though if this is enough and what it all will break.
>> >> >> > Right now "url" is also always utf-8 decoded which made me create
>> >> >> > issue 79.
>> >> >>
>> >> >>
>> >> >> Well, I thought that too but it would kind of break things to change
>> >> >> it from a string to a WAUrl. Also, after more thought, I realized that
>> >> >> an HTTP request doesn't have a protocol, port, or (necessarily)
>> >> >> server.
>> >> >
>> >> > Yes it does. The server is in the HOST header. The protocol is either
>> >> > http or https we can get this from the configuration. Same for the
>> >> > port.
>> >>
>> >> Yeah, ok, I suppose you /could/ fake it with the information from the
>> >> configuration (there is no Host: header in HTTP/1.0 but that's likely
>> >> not a big problem these days). Is that misleading though since the
>> >> user might actually have connected differently (particularly for an
>> >> initial connection where seaside's configuration doesn't enter into
>> >> the equation? You could also presumably find the port and protocol of
>> >> the Kom connection from Kom itself somehow...
>> >
>> > Well then, let's exclude the port and scheme:
>> >
>> > WAUrl new parsePath: '/ch/de/index.html'
>> >
>> > works quite well.
>> >
>> >> In either case, it seems to me that changing #url from a string to a
>> >> WAUrl would break existing code. Maybe it's desirable...
>> >
>> > Breaking client code is never desirable.
>> >
>> >> not a
>> >> difficult fix to code that does break and it would probably break
>> >> pretty obviously.
>> >
>> > and there should be pretty few users.
>> >
>> >> >> >> - do you know if the header values in HTTPRequest also need to be
>> >> >> >> decoded? They aren't currently and I don't know if they support UTF-8
>> >> >> >> values or not...
>> >> >> >
>> >> >> > If they are really UTF-8 that would be good. An example is cookie
>> >> >> > values which are transmitted through headers. See also issue 63.
>> >> >> > Before adding such a thing, please make sure it really works with IE
>> >> >> > 6, Firefox 2, Safari 2 and Opera 9 with utf-8, ISO-8859-1 and utf-16.
>> >> >> > Ideally also Big5 and Shift JIS though I have to admit I never tested
>> >> >> > with those. Unfortunately the HTTP spec/theory and browsers/reality
>> >> >> > are different.
>> >> >>
>> >> >>
>> >> >> Are you suggesting auto-detecting the encoding of headers sent by the
>> >> >> browser?
>> >> >
>> >> > No not at all. But in Seaside 2.9 we now know the encoding oft the web
>> >> > application. Even if there is a spec, you will simply have to try all
>> >> > browsers with at least iso-8859-1 and utf-8. Either there is a rule or
>> >> > we can't support it. It's as simple as that. A short googling suggests
>> >> > that headers are ASCII. We might or might not want to support a custom
>> >> > encoding for cookie values.
>> >> >
>> >> >> I don't think the browser specifies an encoding in the
>> >> >> headers does it? I'm not sure I want to tackle this mess right now but
>> >> >> I'll keep it in mind. :)
>> >> >
>> >> > It can, in the content-type header. Not that it often does.
>> >> >
>> >> >> I'd have to think about this more but if we are supporting all those
>> >> >> encodings, wouldn't it be nice to have a pair of encoders: one for
>> >> >> what we want our Response encoding to be and one for the encoding we
>> >> >> want to use internally (convert Request data *TO* and Response data
>> >> >> *from*). So you could use a UTF-8 converter for "outside" and a Squeak
>> >> >> encoding converter for "inside"; all incoming data would be converted
>> >> >> to Squeak encoding and anything going out would be converted from
>> >> >> Squeak encoding to UTF-8. If you had UTF-8 encoders for both then you
>> >> >> wouldn't have to do any encoding going out but incoming might still
>> >> >> have to be converted to UTF-8 if it was, for example, UTF-16.
>> >> >
>> >> > No, internally we ideally want only Squeak/Smalltalk encoding.
>> >> > Otherwise we can throw String away and just use ByteArray. The problem
>> >> > is that WideStrings are bugged and slow and for legacy reasons we have
>> >> > to support "null encoding". Everything else is insanity. Same goes for
>> >> > using utf-8 internally and utf-16 externally. Second for some external
>> >> > parts (like URLs) the external ecoding is given.
>> >>
>> >> It doesn't appear quite that simple to me... if you have data in UTF
>> >> format in a database, you might well prefer to use UTF encoding
>> >> internally
>> >
>> > There is no such thing as UTF encoding. Using an encoding other than
>> > Squeak fixes #= but breaks _every_ method except #,. The only reason
>> > you might want this is to avoid the performance penalties of
>> > WideString. But then again have you profiled your application and can
>> > you prove to me that WideStrings are your performance bottleneck? Else
>> > this is pure premature optimization.
>> >
>> >> (or at very least be able to specify the encoding of that
>> >> data when giving it to the canvas).
>> >
>> > No, you must adhere to the Seaside contract. You give Strings to
>> > Seaside in the same encoding you expect Seaside to give Strings to
>> > you. Everything else is a pure horror.
>> >
>> >> Does squeak encoding doesn't
>> >> support anything outside basic accented characters does it?
>> >
>> > Squeak supports a superset of Unicode including astral planes.
>> >
>> >> Same goes
>> >> for incoming form data if you have to put it in a database... you
>> >> don't want to be putting it in in Squeak encoding.
>> >
>> > That's between you and your database driver. That doesn't include
>> > Seaside at all.
>> >
>> > I still think this belongs to seaside-dev.
>> >
>> > Cheers
>> > Philippe
>>
>> > _______________________________________________
>> > seaside mailing list
>> > seaside at lists.squeakfoundation.org
>> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>> >
>> _______________________________________________
>> seaside mailing list
>> seaside at lists.squeakfoundation.org
>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>>
> _______________________________________________
> seaside mailing list
> seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
More information about the seaside-dev
mailing list