[Seaside] [BUG][FIX] SocketStream, MultipartChunk, WAKom uploading slowness

Michal miso.list at auf.net
Thu Apr 1 07:43:29 CEST 2004


[Short version: changesets appended for a 2x to 5x speed increase in
uploading file through SocketStream changes and assorted changes
upstream. Anomalies remaining.]

As a side-project, I am writing an online repository for academic
papers in my research field, using Squeak/Seaside. Testing a first
prototype, I hit on a bug when people would upload large-ish files
(3-10MB): the upload form would hang indefinitely, no error, no
feedback.

This is not a browser or network issue, as I can reproduce that on my
local machine, uploading 10MB to a local seaside server. Opening a
ProcessBrowser in the server shows the problem: the Seaside process is
in SocketStream>>upToAll: and its caller MultipartChunk saveToStream:
The comment in the latter says it all:

   "This method is really dumb, it reads the entire body of the
   chunk before writing it onto outStream. "

It turns out however that while that comment is true, the real
offender is #upToAll: which incrementally copies the incoming file
dozens of times over with:

searchBuffer := searchBuffer , self inStream upToEnd

Replacing the search buffer by a WriteStream and assorted changes,
roughly doubles the speed of uploads for small files (up to 1MB), and
makes a 3MB upload 5 times faster. The observed hang also disappears.

The second fix is in #saveToStream: It turns out that all the callers
of #saveToStream: actually want a String. Making MultipartChunk
provide a string not only adds a 10% speed boost, but also simplifies
the code in its callers.  Hence I propose to eliminate #saveToStream:
in favor of #fileContents (attached) - killing two issues in one move.
This second change adds about 10% speed in the my tests. Overall:

32k: 44ms -> 17ms
1.3MB: 33s -> 14s
3.4MB: 252s -> 52s

(I still don't know why the upload originally hanged rather than
timing out or reporting some error)

Michal
-------------- next part --------------
'From Squeak3.6 of ''6 October 2003'' [latest update: #5429] on 31 March 2004 at 3:44:58 pm'!

!SocketStream methodsFor: 'stream in' stamp: 'miso 3/31/2004 15:44'!
upToAll: delims
"michal - optimised for large files: (i) use a WriteStream for the searchBuffer rather than copying the incoming portions over and over again, (ii) when searching the buffer for the boundary, do not copy it, use its raw contents with #originalContents. Note that this entails that the blank space at the end of the buffer will also be searched while searching for the boundary. This is semantically harmless since spaces cannot be part of the boundary (http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html) and it is actually good speedwise since this allows us to use the very fast search on strings (a primitive) without doing any string copying, which is crucial for large files."

	| searchBuffer index startOfSearch nextStartOfSearch |
	searchBuffer _ WriteStream on: (String new: 1000).
	startOfSearch _ 1.
	[nextStartOfSearch _ searchBuffer position.
	searchBuffer nextPutAll: self inStream upToEnd.
	self resetInStream.
	index _ searchBuffer originalContents indexOfSubCollection: delims startingAt: startOfSearch.
	startOfSearch _ nextStartOfSearch.
	index = 0 and: [self atEnd not]]
		whileTrue: [self receiveData].

	index = 0 
		ifTrue: [index := 0 max: searchBuffer size]
		ifFalse:
			[self pushBack: (searchBuffer contents copyFrom: index + delims size to: searchBuffer size)].
	^searchBuffer contents copyFrom: 1 to: (0 max: index-1)! !
-------------- next part --------------
'From Squeak3.6 of ''6 October 2003'' [latest update: #5429] on 31 March 2004 at 3:56:39 pm'!
"Change Set:		replaceSaveToStream
Date:			31 March 2004
Author:			miso

Makes MultipartChunk #saveToStream: obsolete. As that method's comment noted, it was extremely inefficient. It turns out however that all the (2) senders of that method actually wanted a string, but had to provide a stream and then copy the whole content of the stream back to a string. Since this is intended for files, this results in potentially megabytes of unnecessary copying. The fix is to use #fileContents which simply returns the desired string.

(Note that an equivalent alternative would be to keep #saveToStream:, and have it pass the stream to SocketStream upToAll: which would have to become #upToAll:writingInto:. I have an implementation of that too, but it adds complexity for no good reason."!


!HttpRequest methodsFor: 'multipart forms' stamp: 'miso 3/31/2004 15:50'!
multipartFormFieldsDo: aBlock
	"USAGE:
	request multipartFormFieldsDo: 
		[:chunk |
		result _ chunk fileContents]."
	"NOTE: if the chunk is not saved, save it after aBlock"

	| mChunk |
	(self method = 'POST' and: [self contentType = MIMEDocument contentTypeMultipart])
		ifFalse: [^nil].

	stream peekForAll: self multipartBoundary.
	[stream atEnd or: [stream peekForAll: '--']] whileFalse:
		[stream next: 2.  "Advance over CRLF"
		mChunk := self nextChunkHeader.
		aBlock value: mChunk.
		mChunk isSaved ifFalse: 
			[mChunk fileName isEmptyOrNil
			ifFalse: [mChunk fileContents]
			ifTrue: [self postFields at: mChunk fieldName put: mChunk fileContents]]]! !


!MultipartChunk methodsFor: 'accessing' stamp: 'miso 3/31/2004 13:42'!
fileContents

	| body origFileName boundary |
	self setSavedFlag.
	origFileName := self fileName.
	boundary := String crlf, self multipartBoundary.
	body := self stream upToAll: boundary.

	"IE4 for Mac appends 128 bytes of Mac file system info - must remove"
	body size >= 128 ifTrue:
		[((body at: 1) asciiValue = 0 and: 
		[(body at: 2) asciiValue = origFileName size and: 
		[(body copyFrom: 3 to: origFileName size + 2) = origFileName]])
			ifTrue: [body := body copyFrom: 129 to: body size]].

	^ body.
! !


!WAKom methodsFor: 'as yet unclassified' stamp: 'miso 3/31/2004 15:46'!
processMultipartFields: aRequest
	aRequest multipartFormFieldsDo:
		[:chunk |
		chunk fileName isEmptyOrNil ifFalse:
			[| file |
			file _ WAFile new fileName: chunk fileName; contents: chunk fileContents.
			aRequest postFields at: chunk fieldName put: file]].! !



More information about the Seaside mailing list