[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)
leves at caesar.elte.hu
Fri Oct 22 07:43:28 UTC 2021
On Thu, 21 Oct 2021, gettimothy wrote:
> This appears to work now,
> ping: zero elements. Time: 0:00:00:10.941282
> ping: one hundred thousand elements. Time: 0:00:00:21.888107
> this is on StandardFileStream...
> Transcript clear.
> ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )).
> [(DocDemoSaxHandler on:ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX'
> those are lightening fast.
> gonna run the full thing now.
Unless you know that your input file only contains characters with
codepoint < 128, you should use FileStream instead of StandardFileStream.
The latter just returns the raw bytes as characters without decoding while
the former does proper character conversion (like the Pharo code or
I suspect that the extra speed you reported in another email is just
the side effect of skipping the character conversion.
More information about the Squeak-dev