[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)
Levente Uzonyi
leves at caesar.elte.hu
Fri Oct 22 07:43:28 UTC 2021
Hi Tim,
On Thu, 21 Oct 2021, gettimothy wrote:
>
> Hey!
>
>
> This appears to work now,
>
>
>
>
> ping: zero elements. Time: 0:00:00:10.941282
> 8717587920
> ping: one hundred thousand elements. Time: 0:00:00:21.888107
> 8787084464
>
>
> this is on StandardFileStream...
>
> |ios|
> Transcript clear.
> ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )).
> [(DocDemoSaxHandler on:ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX'
>
>
> those are lightening fast.
>
> gonna run the full thing now.
Unless you know that your input file only contains characters with
codepoint < 128, you should use FileStream instead of StandardFileStream.
The latter just returns the raw bytes as characters without decoding while
the former does proper character conversion (like the Pharo code or
FSReadStream).
I suspect that the extra speed you reported in another email is just
the side effect of skipping the character conversion.
Levente
More information about the Squeak-dev
mailing list
|