[squeak-dev] 20X speedup for read file upToEnd
Levente Uzonyi
leves at caesar.elte.hu
Sun Jan 24 18:39:01 UTC 2021
Hi Dave,
On Sat, 23 Jan 2021, David T. Lewis wrote:
> Hi Levente,
>
> Thanks, that looks better. Do you want to commit the update to keep
> the author initials right? If not I'll commit it and credit you in
> the commit comment.
I'll commit those two methods.
>
> As an aside, this also makes me notice that #upToEnd does not work
> for file streams that do not know their size, such as FileStream
> stdin, on any file stream on an OS Pipe.
It depends on what the expected behavior is in that case. If it is to read
all available data, then I think it works. If it is to read until the
stream is #atEnd, then no, it doesn't work.
Levente
>
> Here is the variation that I did (back in 2006) for OSProcess, which
> uses AttachableFileStream subclassed from StandardFileStream:
>
> AttachableFileStream>>upToEnd
> "Answer a subcollection from the current access position through the last element
> of the receiver. This is slower than the method in StandardFileStream, but it
> works with pipes which answer false to #atEnd when no further input is
> currently available, but the pipe is not yet closed."
>
> | newStream buffer nextBytes |
> buffer := buffer1 species new: 1000.
> newStream := WriteStream on: (buffer1 species new: 100).
> [self atEnd or: [(nextBytes := self nextInto: buffer) isEmpty]]
> whileFalse: [newStream nextPutAll: nextBytes].
> ^ newStream contents
>
> The AttachableFileStream hack in OSProcess really needs to go away,
> so I think I'd to find a way to make these things work with good
> performance on any kind of file stream. But that is a topic for another
> thread.
>
> Thanks,
> Dave
>
> On Sat, Jan 23, 2021 at 08:58:07PM +0100, Levente Uzonyi wrote:
>> Hi Dave,
>>
>> Good catch. That method of StandardFileStream, unlike #upTo: and
>> #upToAnyOf:do:, has not been optimized.
>>
>> IMO, StandardFileStream >> #upToEnd should simply be
>>
>> ^self next: self size - self position
>>
>> And I suggest your implementation be added to MultiByteFileStream with the
>> following modifications:
>>
>> upToEnd
>> "Answer a subcollection from the current access position through the
>> last element of the receiver."
>>
>> | remainingEstimate |
>> self isBinary ifTrue: [ ^super upToEnd ].
>> remainingEstimate := self size - self position.
>> ^self collectionSpecies
>> new: remainingEstimate
>> streamContents: [ :stream |
>> | elements chunkSize |
>> chunkSize := remainingEstimate min: 2000. "It's not
>> worth allocating larger chunks"
>> [ (elements := self next: chunkSize) isEmpty ]
>> whileFalse: [
>> stream nextPutAll: elements ] ]
>>
>>
>> Levente
>>
>> On Fri, 22 Jan 2021, David T. Lewis wrote:
>>
>> >Attached is a small change that gives a big performance boost for reading
>> >a file upToEnd. My use case (where I noticed this) is reading an image
>> >file, where the file is opened, the header is read, and the remainder of
>> >the file upToEnd is read in as the object memory:
>> >
>> > fs := FileStream readOnlyFileNamed: Smalltalk imageName. fs binary.
>> > t := Time millisecondsToRun: [ImageSnapshot fromStream: fs].
>> > fs close.
>> > t
>> > ==> 12428 "original implementation"
>> > ==> 645 "new version"
>> >
>> >Overall speedup is about 20X for MultiByteFileStream and over 100x for
>> >StandardFileStream (difference due to Levente's earlier improvements
>> >in MultiByteFileStream).
>> >
>> >This small change touches two packages so I'm posting it as a change set
>> >for comment. I'll put it in trunk if there are no issues.
>> >
>> >Dave
>> >
>> >
>>
More information about the Squeak-dev
mailing list
|