[squeak-dev] 20X speedup for read file upToEnd

Levente Uzonyi leves at caesar.elte.hu
Sun Jan 24 18:39:01 UTC 2021


Hi Dave,

On Sat, 23 Jan 2021, David T. Lewis wrote:

> Hi Levente,
>
> Thanks, that looks better. Do you want to commit the update to keep
> the author initials right? If not I'll commit it and credit you in
> the commit comment.

I'll commit those two methods.

>
> As an aside, this also makes me notice that #upToEnd does not work
> for file streams that do not know their size, such as FileStream
> stdin, on any file stream on an OS Pipe.

It depends on what the expected behavior is in that case. If it is to read 
all available data, then I think it works. If it is to read until the 
stream is #atEnd, then no, it doesn't work.


Levente

>
> Here is the variation that I did (back in 2006) for OSProcess, which
> uses AttachableFileStream subclassed from StandardFileStream:
>
> AttachableFileStream>>upToEnd
> 	"Answer a subcollection from the current access position through the last element
> 	of the receiver. This is slower than the method in StandardFileStream, but it
> 	works with pipes which answer false to #atEnd when no further input is
> 	currently available, but the pipe is not yet closed."
>
> 	| newStream buffer nextBytes |
> 	buffer := buffer1 species new: 1000.
> 	newStream := WriteStream on: (buffer1 species new: 100).
> 	[self atEnd or: [(nextBytes := self nextInto: buffer) isEmpty]]
> 		whileFalse: [newStream nextPutAll: nextBytes].
> 	^ newStream contents
>
> The AttachableFileStream hack in OSProcess really needs to go away,
> so I think I'd to find a way to make these things work with good
> performance on any kind of file stream. But that is a topic for another
> thread.
>
> Thanks,
> Dave
>
> On Sat, Jan 23, 2021 at 08:58:07PM +0100, Levente Uzonyi wrote:
>> Hi Dave,
>> 
>> Good catch. That method of StandardFileStream, unlike #upTo: and 
>> #upToAnyOf:do:, has not been optimized.
>> 
>> IMO, StandardFileStream >> #upToEnd should simply be
>>
>> 	^self next: self size - self position
>> 
>> And I suggest your implementation be added to MultiByteFileStream with the 
>> following modifications:
>> 
>> upToEnd
>> 	"Answer a subcollection from the current access position through the
>> 	last element of the receiver."
>>
>> 	| remainingEstimate |
>> 	self isBinary ifTrue: [ ^super upToEnd ].
>> 	remainingEstimate := self size - self position.
>> 	^self collectionSpecies
>> 		new: remainingEstimate
>> 		streamContents: [ :stream |
>> 			| elements chunkSize |
>> 			chunkSize := remainingEstimate min: 2000. "It's not
>> 			worth allocating larger chunks"
>> 			[ (elements := self next: chunkSize) isEmpty ]
>> 			whileFalse: [
>> 				stream nextPutAll: elements ] ]
>> 
>> 
>> Levente
>> 
>> On Fri, 22 Jan 2021, David T. Lewis wrote:
>> 
>> >Attached is a small change that gives a big performance boost for reading
>> >a file upToEnd. My use case (where I noticed this) is reading an image
>> >file, where the file is opened, the header is read, and the remainder of
>> >the file upToEnd is read in as the object memory:
>> >
>> > fs := FileStream readOnlyFileNamed: Smalltalk imageName. fs binary.
>> > t := Time millisecondsToRun: [ImageSnapshot fromStream: fs].
>> > fs close.
>> > t
>> > ==> 12428 "original implementation"
>> > ==> 645 "new version"
>> >
>> >Overall speedup is about 20X for MultiByteFileStream and over 100x for
>> >StandardFileStream (difference due to Levente's earlier improvements
>> >in MultiByteFileStream).
>> >
>> >This small change touches two packages so I'm posting it as a change set
>> >for comment. I'll put it in trunk if there are no issues.
>> >
>> >Dave
>> >
>> >
>>


More information about the Squeak-dev mailing list