[squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
Tobias Pape
Das.Linux at gmx.de
Tue Oct 31 17:03:17 UTC 2017
Followup
> On 31.10.2017, at 17:54, Tobias Pape <Das.Linux at gmx.de> wrote:
>
>
>> On 31.10.2017, at 07:51, monty <monty2 at programmer.net> wrote:
>>
>>> Sent: Friday, October 27, 2017 at 1:33 PM
>>> From: "Tobias Pape" <Das.Linux at gmx.de>
>>> To: "The general-purpose Squeak developers list" <squeak-dev at lists.squeakfoundation.org>
>>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>
>>>
>>>> On 27.10.2017, at 18:40, monty <monty2 at programmer.net> wrote:
>>>>
>>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>>>>
>>>> Is there any way to disable the automatic response decoding?
>>>
>>> That's problematic.
>>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>>>
>>> This is the old version, that opportunistically applied inflate/deflate:
>>
>> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
>>
>
> Yes, when the http-header says so?
>
> (see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)
>
> I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.
> Actually, it probably should also handle the 'charset' (section 3.4) header, too.
Also, the protocol says:
"HTTP/1.1 recipients MUST respect the charset label provided by the sender"
>
> Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.
>
> OTOH, you still are able to retrieve the information from the message.
>
> So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.
>
>
>
>> Please review and merge WebClient-Core-monty.114.mcz.
>
> I'll review.
>
>
> Best regards
> -Tobias
>
>
>
>>
>>> ================
>>> getContentWithProgress: progressBlockOrNil
>>> "Reads available content and returns it."
>>>
>>> | length result |
>>> length := self contentLength.
>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>> new: (length ifNil: [ 1000 ])
>>> streamContents: [ :outputStream |
>>> self
>>> streamFrom: stream
>>> to: outputStream
>>> size: length
>>> progress: progressBlockOrNil ].
>>> (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
>>> ^(GZipReadStream on: result) upToEnd
>>> =================
>>>
>>>
>>> Maybe we need a plain "get me the bytes" variant.
>>> Oh there is:
>>>
>>> WebMessage>>streamFrom:to:size:progress:
>>>
>>> That's somewhat inelegant but should work…
>>>
>>>
>>>
>>> Best regards
>>> -Tobias
>>>
>>>>
>>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>>>>> From: commits at source.squeak.org
>>>>> To: squeak-dev at lists.squeakfoundation.org, packages at lists.squeakfoundation.org
>>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>>>
>>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>>>>
>>>>> ==================== Summary ====================
>>>>>
>>>>> Name: WebClient-Core-topa.112
>>>>> Author: topa
>>>>> Time: 20 September 2017, 5:29:06.096983 pm
>>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>>>>> Ancestors: WebClient-Core-topa.111
>>>>>
>>>>> Abide Postel's law for text conversion.
>>>>>
>>>>> Be conservative in what you do, be liberal in what you accept from others.
>>>>>
>>>>> =============== Diff against WebClient-Core-topa.111 ===============
>>>>>
>>>>> Item was changed:
>>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>>>> getContentWithProgress: progressBlockOrNil
>>>>> "Reads available content and returns it."
>>>>>
>>>>> | length result |
>>>>> length := self contentLength.
>>>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>>>> new: (length ifNil: [ 1000 ])
>>>>> streamContents: [ :outputStream |
>>>>> self
>>>>> streamFrom: stream
>>>>> to: outputStream
>>>>> size: length
>>>>> progress: progressBlockOrNil ].
>>>>> self decoderForContentEncoding ifNotNil: [:decoder |
>>>>> result := (decoder on: result) upToEnd].
>>>>> self textConverterForContentType ifNotNil: [:converter |
>>>>> + [result := result convertFromWithConverter: converter]
>>>>> + on: InvalidUTF8 "some servers lie"
>>>>> + do: [^ result]].
>>>>> - result := result convertFromWithConverter: converter].
>>>>> ^ result
>>>>> !
>>>>>
>>>>> Item was changed:
>>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>>>> textConverterForContentType
>>>>>
>>>>> | index contentType |
>>>>> contentType := self contentType.
>>>>> contentType size < 8 ifTrue: [ ^nil ].
>>>>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>>>> index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>>>> index = 0 ifTrue: [ ^nil ].
>>>>> ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!
>
>
More information about the Squeak-dev
mailing list
|