[squeak-dev] The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape Das.Linux at gmx.de
Tue Oct 31 16:54:43 UTC 2017


> On 31.10.2017, at 07:51, monty <monty2 at programmer.net> wrote:
> 
>> Sent: Friday, October 27, 2017 at 1:33 PM
>> From: "Tobias Pape" <Das.Linux at gmx.de>
>> To: "The general-purpose Squeak developers list" <squeak-dev at lists.squeakfoundation.org>
>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>> 
>> 
>>> On 27.10.2017, at 18:40, monty <monty2 at programmer.net> wrote:
>>> 
>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>>> 
>>> Is there any way to disable the automatic response decoding?
>> 
>> That's problematic.
>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>> 
>> This is the old version, that opportunistically applied inflate/deflate:
> 
> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
> 

Yes, when the http-header says so?

(see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)

I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.
Actually, it probably should also handle the 'charset' (section 3.4) header, too.

Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.

OTOH, you still are able to retrieve the information from the message.

So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.



> Please review and merge WebClient-Core-monty.114.mcz.

I'll review.


Best regards
	-Tobias



> 
>> ================
>> getContentWithProgress: progressBlockOrNil
>> 	"Reads available content and returns it."
>> 
>> 	| length result |
>> 	length := self contentLength.
>> 	result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ]) 
>> 		new: (length ifNil: [ 1000 ])
>> 		streamContents: [ :outputStream | 
>> 			self 
>> 				streamFrom: stream
>> 				to: outputStream
>> 				size: length
>> 				progress: progressBlockOrNil ].
>> 	(self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
>> 	^(GZipReadStream on: result) upToEnd
>> =================
>> 
>> 
>> Maybe we need a plain "get me the bytes" variant.
>> Oh there is:
>> 
>> WebMessage>>streamFrom:to:size:progress: 
>> 
>> That's somewhat inelegant but should work…
>> 
>> 
>> 
>> Best regards
>> 	-Tobias
>> 
>>> 
>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>>>> From: commits at source.squeak.org
>>>> To: squeak-dev at lists.squeakfoundation.org, packages at lists.squeakfoundation.org
>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>> 
>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>>> 
>>>> ==================== Summary ====================
>>>> 
>>>> Name: WebClient-Core-topa.112
>>>> Author: topa
>>>> Time: 20 September 2017, 5:29:06.096983 pm
>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>>>> Ancestors: WebClient-Core-topa.111
>>>> 
>>>> Abide Postel's law for text conversion.
>>>> 
>>>> Be conservative in what you do, be liberal in what you accept from others.
>>>> 
>>>> =============== Diff against WebClient-Core-topa.111 ===============
>>>> 
>>>> Item was changed:
>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>>> getContentWithProgress: progressBlockOrNil
>>>> 	"Reads available content and returns it."
>>>> 
>>>> 	| length result |
>>>> 	length := self contentLength.
>>>> 	result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ]) 
>>>> 		new: (length ifNil: [ 1000 ])
>>>> 		streamContents: [ :outputStream | 
>>>> 			self 
>>>> 				streamFrom: stream
>>>> 				to: outputStream
>>>> 				size: length
>>>> 				progress: progressBlockOrNil ].
>>>> 	self decoderForContentEncoding ifNotNil: [:decoder |
>>>> 		result := (decoder on: result) upToEnd].
>>>> 	self textConverterForContentType ifNotNil: [:converter |
>>>> + 		[result := result convertFromWithConverter: converter]
>>>> + 		on: InvalidUTF8 "some servers lie"
>>>> + 		do: [^ result]].
>>>> - 		result := result convertFromWithConverter: converter].
>>>> 	^ result
>>>> !
>>>> 
>>>> Item was changed:
>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>>> textConverterForContentType
>>>> 
>>>> 	| index contentType |
>>>> 	contentType := self contentType.
>>>> 	contentType size < 8 ifTrue: [ ^nil ].
>>>> - 	(contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>>> 	index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>>> 	index = 0 ifTrue: [ ^nil ].
>>>> 	^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!



More information about the Squeak-dev mailing list