[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)

gettimothy gettimothy at zoho.com
Thu Oct 21 18:07:48 UTC 2021


Hey!





This appears to work now, 









ping: zero elements.  Time: 0:00:00:10.941282

8717587920

ping: one hundred thousand elements.  Time: 0:00:00:21.888107

8787084464



this is on StandardFileStream...

|ios|

Transcript clear.

ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )).

[(DocDemoSaxHandler on:ios) pingevery:100000;  optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX'



those are lightening fast.

gonna run the full thing now.

cheers.






---- On Thu, 21 Oct 2021 13:46:52 -0400 gettimothy via Squeak-dev <squeak-dev at lists.squeakfoundation.org> wrote ----




Also...



Those snippets are modelled on stuff from  the SAXHandler class comment.



I recall trying to create a ReadStream on the file, but I kept getting those FS errors...maybe I should retry?










---- On Thu, 21 Oct 2021 13:13:44 -0400 Levente Uzonyi <mailto:leves at caesar.elte.hu> wrote ----


Hi Tim, 
 
On Thu, 21 Oct 2021, gettimothy wrote: 
 
> Thx Levente. 
> 
> 
> Should I attempt to fix this? How should it be approached? 
> 
> I have only a dim idea what "read buffering is" (file access is slow, so get a lot of data, at a certain threshold, asynchonously refill the buffer?). 
> 
> Is there an existing Stream that implemts it? 
> 
> Should I take the guts of that and put it in FSReadStream?  
 
What is the snippet you execute to parse the documents? 
 
(I loaded Monty's XML parser and checking the code makes me think 
that you create an FSReadStream not Monty's code). 
 
 
Levente
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211021/614bc8a0/attachment.html>


More information about the Squeak-dev mailing list