[squeak-dev] The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape Das.Linux at gmx.de
Tue Oct 31 18:23:16 UTC 2017


Followup 2:

> On 31.10.2017, at 18:03, Tobias Pape <Das.Linux at gmx.de> wrote:
> 
> Followup
>> On 31.10.2017, at 17:54, Tobias Pape <Das.Linux at gmx.de> wrote:
>> 
>> 
>>> On 31.10.2017, at 07:51, monty <monty2 at programmer.net> wrote:
>>> 
>>>> Sent: Friday, October 27, 2017 at 1:33 PM
>>>> From: "Tobias Pape" <Das.Linux at gmx.de>
>>>> To: "The general-purpose Squeak developers list" <squeak-dev at lists.squeakfoundation.org>
>>>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>> 
>>>> 
>>>>> On 27.10.2017, at 18:40, monty <monty2 at programmer.net> wrote:
>>>>> 
>>>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>>>>> 
>>>>> Is there any way to disable the automatic response decoding?
>>>> 
>>>> That's problematic.
>>>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>>>> 
>>>> This is the old version, that opportunistically applied inflate/deflate:
>>> 
>>> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
>>> 
>> 
>> Yes, when the http-header says so?
>> 
>> (see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)
>> 
>> I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.
>> Actually, it probably should also handle the 'charset' (section 3.4) header, too.
> 
> 
> Also, the protocol says:
> 
> "HTTP/1.1 recipients MUST respect the charset label provided by the sender"


For html (and xml) it is stated that the header takes precedence regardless:

https://www.w3.org/International/questions/qa-html-encoding-declarations.en
"The HTTP header information has the highest priority when it conflicts with in-document declarations […]"


> 
> 
>> 
>> Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.
>> 
>> OTOH, you still are able to retrieve the information from the message.
>> 
>> So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.
>> 
>> 
>> 
>>> Please review and merge WebClient-Core-monty.114.mcz.
>> 
>> I'll review.
>> 
>> 
>> Best regards
>> 	-Tobias
>> 
>> 
>> 
>>> 
>>>> ================
>>>> getContentWithProgress: progressBlockOrNil
>>>> 	"Reads available content and returns it."
>>>> 
>>>> 	| length result |
>>>> 	length := self contentLength.
>>>> 	result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ]) 
>>>> 		new: (length ifNil: [ 1000 ])
>>>> 		streamContents: [ :outputStream | 
>>>> 			self 
>>>> 				streamFrom: stream
>>>> 				to: outputStream
>>>> 				size: length
>>>> 				progress: progressBlockOrNil ].
>>>> 	(self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
>>>> 	^(GZipReadStream on: result) upToEnd
>>>> =================
>>>> 
>>>> 
>>>> Maybe we need a plain "get me the bytes" variant.
>>>> Oh there is:
>>>> 
>>>> WebMessage>>streamFrom:to:size:progress: 
>>>> 
>>>> That's somewhat inelegant but should work…
>>>> 
>>>> 
>>>> 
>>>> Best regards
>>>> 	-Tobias
>>>> 
>>>>> 
>>>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>>>>>> From: commits at source.squeak.org
>>>>>> To: squeak-dev at lists.squeakfoundation.org, packages at lists.squeakfoundation.org
>>>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>>>> 
>>>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>>>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>>>>> 
>>>>>> ==================== Summary ====================
>>>>>> 
>>>>>> Name: WebClient-Core-topa.112
>>>>>> Author: topa
>>>>>> Time: 20 September 2017, 5:29:06.096983 pm
>>>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>>>>>> Ancestors: WebClient-Core-topa.111
>>>>>> 
>>>>>> Abide Postel's law for text conversion.
>>>>>> 
>>>>>> Be conservative in what you do, be liberal in what you accept from others.
>>>>>> 
>>>>>> =============== Diff against WebClient-Core-topa.111 ===============
>>>>>> 
>>>>>> Item was changed:
>>>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>>>>> getContentWithProgress: progressBlockOrNil
>>>>>> 	"Reads available content and returns it."
>>>>>> 
>>>>>> 	| length result |
>>>>>> 	length := self contentLength.
>>>>>> 	result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ]) 
>>>>>> 		new: (length ifNil: [ 1000 ])
>>>>>> 		streamContents: [ :outputStream | 
>>>>>> 			self 
>>>>>> 				streamFrom: stream
>>>>>> 				to: outputStream
>>>>>> 				size: length
>>>>>> 				progress: progressBlockOrNil ].
>>>>>> 	self decoderForContentEncoding ifNotNil: [:decoder |
>>>>>> 		result := (decoder on: result) upToEnd].
>>>>>> 	self textConverterForContentType ifNotNil: [:converter |
>>>>>> + 		[result := result convertFromWithConverter: converter]
>>>>>> + 		on: InvalidUTF8 "some servers lie"
>>>>>> + 		do: [^ result]].
>>>>>> - 		result := result convertFromWithConverter: converter].
>>>>>> 	^ result
>>>>>> !
>>>>>> 
>>>>>> Item was changed:
>>>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>>>>> textConverterForContentType
>>>>>> 
>>>>>> 	| index contentType |
>>>>>> 	contentType := self contentType.
>>>>>> 	contentType size < 8 ifTrue: [ ^nil ].
>>>>>> - 	(contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>>>>> 	index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>>>>> 	index = 0 ifTrue: [ ^nil ].
>>>>>> 	^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!



More information about the Squeak-dev mailing list