[squeak-dev] 20X speedup for read file upToEnd

David T. Lewis lewis at mail.msen.com
Sat Jan 23 20:55:08 UTC 2021


Hi Levente,

Thanks, that looks better. Do you want to commit the update to keep
the author initials right? If not I'll commit it and credit you in
the commit comment.

As an aside, this also makes me notice that #upToEnd does not work
for file streams that do not know their size, such as FileStream
stdin, on any file stream on an OS Pipe.

Here is the variation that I did (back in 2006) for OSProcess, which
uses AttachableFileStream subclassed from StandardFileStream:

AttachableFileStream>>upToEnd
	"Answer a subcollection from the current access position through the last element
	of the receiver. This is slower than the method in StandardFileStream, but it
	works with pipes which answer false to #atEnd when no further input is
	currently available, but the pipe is not yet closed."

	| newStream buffer nextBytes |
	buffer := buffer1 species new: 1000.
	newStream := WriteStream on: (buffer1 species new: 100).
	[self atEnd or: [(nextBytes := self nextInto: buffer) isEmpty]]
		whileFalse: [newStream nextPutAll: nextBytes].
	^ newStream contents

The AttachableFileStream hack in OSProcess really needs to go away,
so I think I'd to find a way to make these things work with good
performance on any kind of file stream. But that is a topic for another
thread.

Thanks,
Dave

On Sat, Jan 23, 2021 at 08:58:07PM +0100, Levente Uzonyi wrote:
> Hi Dave,
> 
> Good catch. That method of StandardFileStream, unlike #upTo: and 
> #upToAnyOf:do:, has not been optimized.
> 
> IMO, StandardFileStream >> #upToEnd should simply be
> 
> 	^self next: self size - self position
> 
> And I suggest your implementation be added to MultiByteFileStream with the 
> following modifications:
> 
> upToEnd
> 	"Answer a subcollection from the current access position through the 
> 	last element of the receiver."
> 
> 	| remainingEstimate |
> 	self isBinary ifTrue: [ ^super upToEnd ].
> 	remainingEstimate := self size - self position.
> 	^self collectionSpecies
> 		new: remainingEstimate
> 		streamContents: [ :stream |
> 			| elements chunkSize |
> 			chunkSize := remainingEstimate min: 2000. "It's not 
> 			worth allocating larger chunks"
> 			[ (elements := self next: chunkSize) isEmpty ] 
> 			whileFalse: [
> 				stream nextPutAll: elements ] ]
> 
> 
> Levente
> 
> On Fri, 22 Jan 2021, David T. Lewis wrote:
> 
> >Attached is a small change that gives a big performance boost for reading
> >a file upToEnd. My use case (where I noticed this) is reading an image
> >file, where the file is opened, the header is read, and the remainder of
> >the file upToEnd is read in as the object memory:
> >
> > fs := FileStream readOnlyFileNamed: Smalltalk imageName. fs binary.
> > t := Time millisecondsToRun: [ImageSnapshot fromStream: fs].
> > fs close.
> > t
> > ==> 12428 "original implementation"
> > ==> 645 "new version"
> >
> >Overall speedup is about 20X for MultiByteFileStream and over 100x for
> >StandardFileStream (difference due to Levente's earlier improvements
> >in MultiByteFileStream).
> >
> >This small change touches two packages so I'm posting it as a change set
> >for comment. I'll put it in trunk if there are no issues.
> >
> >Dave
> >
> >
> 


More information about the Squeak-dev mailing list