[FIX] Scamper content type handling

Lex Spoon lex at cc.gatech.edu
Sun Nov 7 14:03:18 UTC 1999


"John Duncan" <jddst19+ at pitt.edu> wrote:
> 
> > > > 'text/html; charset=iso-8859-1'
>                   ^^^^^^^^^^^^^^^^^^
> 
> is an example of some sort of annotation, perhaps introduced in a more
> recent version of MIME or HTTP.  What I expect is that the header
> processor in HTTPSocket currently treats this line as always an 'x/x'
> string, though I can imagine many other things going in there:
> 'text/html; charset=iso-8859-1; language=en-us' etc. So, if I'm right,
> then MimeDocument should keep a dictionary or list of annotations and
> HTTPSocket has to be changed to use it. That's more than one fix, and
> Scamper currently isn't very smart, so it's good enough to just filter
> out the rest of it in the content string. Plus, it'll fit into the
> 640k of memory we set aside for applications. :)


Okay, I'm finally in front of a Squeak image and can check this out.  In
fact, the #getDocument:... and #postDocument:... methods were reporting
back the raw Content-Type header, instead of using the provided
#contentType method which will clean it up.  Here's a changeset that
fixes that.

It would indeed ultimately be nice to keep ahold of the content-type
parameters somewhere; for now, however, it would be nice if at least the
raw content-type is reported correctly.

	-Lex



'From Squeak2.7alpha of 25 October 1999 [latest update: #1568] on 7
November 1999 at 1:55:50 pm'!
"Change Set:		ContentType
Date:			7 November 1999
Author:			Lex Spoon

Fixes #getDocument: and #postDocument: to fix up the content-type,
instead of just passing on the value found in the header.
"!


!HTTPSocket class methodsFor: 'get the page' stamp: 'ls 6/3/1999 15:53'!
httpGetDocument: url args: args accept: mimeType request: requestString
	"Return the exact contents of a web object. Asks for the given MIME 
type. If mimeType is nil, use 'text/html'. An extra requestString may be

submitted and must end with crlf.  The parsed header is saved. Use a 
proxy server if one has been registered.  tk 7/23/97 17:12"
	"Note: To fetch raw data, you can use the MIME type 
'application/octet-stream'."

	| serverName serverAddr port sock header length bare page list
firstData 
aStream index connectToHost connectToPort type newUrl |
	Socket initializeNetwork.
	bare _ (url asLowercase beginsWith: 'http://') 
		ifTrue: [url copyFrom: 8 to: url size]
		ifFalse: [url].
	bare _ bare copyUpTo: $#.  "remove fragment, if specified"
	serverName _ bare copyUpTo: $/.
	page _ bare copyFrom: serverName size + 1 to: bare size.
	(serverName includes: $:) 
		ifTrue: [ index _ serverName indexOf: $:.
			port _ (serverName copyFrom: index+1 to: serverName size) asNumber.
			serverName _ serverName copyFrom: 1 to: index-1. ]
		ifFalse: [ port _ self defaultPort ].
	page size = 0 ifTrue: [page _ '/'].
	"add arguments"
	args ifNotNil: [page _ page, (self argString: args) ].


	HTTPProxyServer isNil
		ifTrue: [ 
			connectToHost _ serverName.
			connectToPort _ port ]
		ifFalse:  [
			page _ 'http://', serverName, ':', port printString, page.		"put back

together"
			connectToHost _ HTTPProxyServer.
			connectToPort _ HTTPProxyPort].
	
	self flag: #XXX.  "this doesn't make sense if a user isn't available
for 
questioning...  -ls"
	self retry: [serverAddr _ NetNameResolver addressForName: connectToHost

timeout: 20.
				serverAddr ~~ nil] 
		asking: 'Trouble resolving server name.  Keep trying?'
		ifGiveUp: [Socket deadServer: connectToHost.
				^ 'Could not resolve the server named: ', connectToHost].

3 timesRepeat: [
	sock _ HTTPSocket new.
	sock connectTo: serverAddr port: connectToPort.
	(sock waitForConnectionUntil: (self deadlineSecs: 30)) ifFalse: [
		Socket deadServer: connectToHost.  sock destroy.
		^ 'Server ',connectToHost,' is not responding'].
	Transcript cr; cr; show: url.
	sock sendCommand: 'GET ', page, ' HTTP/1.0', CrLf, 
		(mimeType ifNotNil: ['ACCEPT: ', mimeType, CrLf] ifNil: ['']),
		'ACCEPT: text/html', CrLf,	"Always accept plain text"
		HTTPBlabEmail,	"may be empty"
		requestString,	"extra user request. Authorization"
		'User-Agent: Squeak 1.31', CrLf,
		'Host: ', serverName, ':', port printString, CrLf.	"blank line 
automatically added"

	list _ sock getResponseUpTo: CrLf, CrLf.	"list = header, CrLf, CrLf, 
beginningOfData"
	header _ list at: 1.
	"Transcript show: page; cr; show: header; cr."
	firstData _ list at: 3.
	header isEmpty 
		ifTrue: [aStream _ 'server aborted early']
		ifFalse: [
			"dig out some headers"
			sock header: header.
			length _ sock getHeader: 'content-length'.
			length ifNotNil: [ length _ length asNumber ].
			type _ sock contentType.
			sock responseCode first = $3 ifTrue: [
				newUrl _ sock getHeader: 'location'.
				newUrl ifNotNil: [ 
					Transcript show: 'redirecting to ', newUrl; cr.
					sock destroy.
					^self httpGetDocument: newUrl  args: args  accept: mimeType ] ].
			aStream _ sock getRestOfBuffer: firstData totalLength: length.
			sock responseCode = '401' ifTrue: [^ header, aStream contents].
			].
	sock destroy.	"Always OK to destroy!!"
	aStream class ~~ String ifTrue: [
 		^ MIMEDocument contentType: type content: aStream contents url: url].
	aStream = 'server aborted early' ifFalse: [
		]
	].! !

!HTTPSocket class methodsFor: 'get the page' stamp: 'ls 11/7/1999
13:53'!
httpPostDocument: url  args: argsDict accept: mimeType request:
requestString
	"like httpGET, except it does a POST instead of a GET.  POST allows
data to be uploaded"

	| serverName serverAddr s header length bare page list firstData
aStream port argsStream first specifiedServer type newUrl |
	Socket initializeNetwork.

	"parse url"
	bare _ (url asLowercase beginsWith: 'http://') 
		ifTrue: [url copyFrom: 8 to: url size]
		ifFalse: [url].
	serverName _ bare copyUpTo: $/.
	specifiedServer _ serverName.
	(serverName includes: $:) ifFalse: [ port _ self defaultPort ] ifTrue:
[
		port _ (serverName copyFrom: (serverName indexOf: $:) + 1 
				to: serverName size) asNumber.
		serverName _ serverName copyUpTo: $:.
	].

	page _ bare copyFrom: (bare indexOf: $/) to: bare size.
	page size = 0 ifTrue: [page _ '/'].
	HTTPProxyServer ifNotNil: [ 
		page _ 'http://', serverName, ':', port printString, page.		"put back
together"
		serverName _ HTTPProxyServer.
		port _ HTTPProxyPort].

	"encode the arguments dictionary"
	argsStream _ WriteStream on: String new.
	first _ true.
	argsDict associationsDo: [ :assoc |
		assoc value do: [ :value |
			first ifTrue: [ first _ false ] ifFalse: [ argsStream nextPut: $& ].
			argsStream nextPutAll: assoc key encodeForHTTP.
			argsStream nextPut: $=.
			argsStream nextPutAll: value encodeForHTTP.
	] ].

  	"make the request"	
	self retry: [serverAddr _ NetNameResolver addressForName: serverName
timeout: 20.
				serverAddr ~~ nil] 
		asking: 'Trouble resolving server name.  Keep trying?'
		ifGiveUp: [^ 'Could not resolve the server named: ', serverName].

	s _ HTTPSocket new.
	s connectTo: serverAddr port: port.
	s waitForConnectionUntil: self standardDeadline.
	Transcript cr; show: url; cr.
	s sendCommand: 'POST ', page, ' HTTP/1.0', CrLf, 
		(mimeType ifNotNil: ['ACCEPT: ', mimeType, CrLf] ifNil: ['']),
		'ACCEPT: text/html', CrLf,	"Always accept plain text"
		HTTPBlabEmail,	"may be empty"
		requestString,	"extra user request. Authorization"
		'User-Agent: Squeak 1.31', CrLf,
		'Content-type: application/x-www-form-urlencoded', CrLf,
		'Content-length: ', argsStream contents size printString, CrLf,
		'Host: ', specifiedServer, CrLf.  "blank line automatically added"

	s sendCommand: argsStream contents.

	"get the header of the reply"
	list _ s getResponseUpTo: CrLf, CrLf ignoring: (String with: CR).	"list
= header, CrLf, CrLf, beginningOfData"
	header _ list at: 1.
	"Transcript show: page; cr; show: argsStream contents; cr; show:
header; cr."
	firstData _ list at: 3.

	"dig out some headers"
	s header: header.
	length _ s getHeader: 'content-length'.
	length ifNotNil: [ length _ length asNumber ].
	type _ s contentType.
	s responseCode first = $3 ifTrue: [
		newUrl _ s getHeader: 'location'.
		newUrl ifNotNil: [
			Transcript show: 'redirecting to: ', newUrl; cr.
			s destroy.
			^self httpPostDocument: newUrl  args: argsDict  accept: mimeType ] ].

	aStream _ s getRestOfBuffer: firstData totalLength: length.
	s responseCode = '401' ifTrue: [^ header, aStream contents].
	s destroy.	"Always OK to destroy!!"

	^ MIMEDocument contentType: type  content: aStream contents url: url! !





More information about the Squeak-dev mailing list