[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)

Levente Uzonyi leves at caesar.elte.hu
Fri Oct 22 07:43:28 UTC 2021


Hi Tim,

On Thu, 21 Oct 2021, gettimothy wrote:

> 
> Hey!
> 
> 
> This appears to work now, 
> 
> 
> 
>
>       ping: zero elements.  Time: 0:00:00:10.941282
> 8717587920
> ping: one hundred thousand elements.  Time: 0:00:00:21.888107
> 8787084464
> 
> 
> this is on StandardFileStream...
>
>       |ios|
> Transcript clear.
> ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )).
> [(DocDemoSaxHandler on:ios) pingevery:100000;  optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX'
> 
> 
> those are lightening fast.
> 
> gonna run the full thing now.

Unless you know that your input file only contains characters with 
codepoint < 128, you should use FileStream instead of StandardFileStream.
The latter just returns the raw bytes as characters without decoding while 
the former does proper character conversion (like the Pharo code or 
FSReadStream).

I suspect that the extra speed you reported in another email is just 
the side effect of skipping the character conversion.


Levente


More information about the Squeak-dev mailing list