[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)

gettimothy gettimothy at zoho.com
Fri Oct 22 11:15:25 UTC 2021


Hi Levente



I will test  FileStream next as the data is Unicode and could span multiple languages.



Here are the latest timeProfile stats with the StandardFileStream





  99.6% {4458781ms} [] UndefinedObject>>DoIt

                                                                                                                                                                99.6% {4458781ms} DocDemoSaxHandler(SAXHandler)>>parseDocument

                                                                                                                                                                  99.6% {4458781ms} XMLParser>>parseDocument

                                                                                                                                                                    99.6% {4458781ms} FullBlockClosure(BlockClosure)>>on:do:

                                                                                                                                                                      99.6% {4458781ms} [] XMLParser>>parseDocument

                                                                                                                                                                        99.0% {4434107ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextToken

                                                                                                                                                                          98.6% {4414400ms} XMLContentState>>nextTokenFrom:

                                                                                                                                                                            98.4% {4406601ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextContentToken

                                                                                                                                                                              85.8% {3841387ms} XMLWellFormedParserTokenizer>>nextPCDataToken

                                                                                                                                                                                |23.8% {1067452ms} XMLNestedStreamReader>>next

                                                                                                                                                                                |17.0% {760985ms} XMLNestedStreamReader>>peek

                                                                                                                                                                                |  |9.1% {408603ms} StandardFileStream>>next

                                                                                                                                                                                |  |  |6.5% {290224ms} primitives

                                                                                                                                                                                |  |  |2.6% {118379ms} StandardFileStream>>basicNext

                                                                                                                                                                                |  |5.5% {246642ms} StandardFileStream>>atEnd

                                                                                                                                                                                |  |2.4% {105741ms} primitives

                                                                                                                                                                                |15.2% {680118ms} primitives

                                                                                                                                                                                |11.1% {497634ms} WriteStream>>nextPut:

                                                                                                                                                                                |8.1% {363401ms} XMLWellFormedParserTokenizer>>nextGeneralEntityOrCharacterReferenceOnCharacterStream

                                                                                                                                                                                |  |7.6% {341827ms} XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream

                                                                                                                                                                                |  |  4.0% {181168ms} Dictionary>>at:ifPresent:

                                                                                                                                                                                |  |    |2.5% {110467ms} Dictionary>>scanFor:

                                                                                                                                                                                |  |    |  |2.2% {96275ms} ByteString(String)>>=

                                                                                                                                                                                |  |    |  |  1.5% {68062ms} primitives

                                                                                                                                                                                |  |    |1.3% {56789ms} [] XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream

                                                                                                                                                                                |  |  2.9% {130416ms} XMLWellFormedParserTokenizer>>nextEntityName

                                                                                                                                                                                |7.4% {330967ms} Character>>isXMLChar

                                                                                                                                                                                |1.1% {50607ms} SAXParserDriver>>handlePCData:

                                                                                                                                                                                |  1.0% {46659ms} primitives

                                                                                                                                                                              12.3% {549227ms} XMLWellFormedParserTokenizer>>nextContentMarkupToken

                                                                                                                                                                                11.7% {521727ms} XMLWellFormedParserTokenizer>>nextTag

                                                                                                                                                                                  4.8% {212861ms} XMLWellFormedParserTokenizer>>nextEndTag

                                                                                                                                                                                    |2.2% {99884ms} SAXParserDriver>>handleEndTag:

                                                                                                                                                                                    |  1.8% {80565ms} DocDemoSaxHandler>>endElement:prefix:uri:localName:

                                                                                                                                                                                    |    1.7% {77945ms} DocDemoSaxHandler>>ping

                                                                                                                                                                                    |      1.7% {77912ms} TranscriptStream>>show:

                                                                                                                                                                                    |        1.7% {77905ms} FullBlockClosure(BlockClosure)>>on:do:

                                                                                                                                                                                    |          1.7% {77905ms} [] TranscriptStream>>show:

                                                                                                                                                                                    |            1.7% {77901ms} TranscriptStream>>endEntry

                                                                                                                                                                                    |              1.7% {77901ms} Mutex>>critical:

                                                                                                                                                                                    |                1.7% {77899ms} FullBlockClosure(BlockClosure)>>ensure:






much better.



FileStream timeProfile is running as I type this, should be done in a bit over an hour.



Cordially,



t











---- On Fri, 22 Oct 2021 03:43:28 -0400 Levente Uzonyi <leves at caesar.elte.hu> wrote ----


Hi Tim, 
 
On Thu, 21 Oct 2021, gettimothy wrote: 
 
> 
> Hey! 
> 
> 
> This appears to work now,  
> 
> 
> 
> 
>       ping: zero elements.  Time: 0:00:00:10.941282 
> 8717587920 
> ping: one hundred thousand elements.  Time: 0:00:00:21.888107 
> 8787084464 
> 
> 
> this is on StandardFileStream... 
> 
>       |ios| 
> Transcript clear. 
> ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )). 
> [(DocDemoSaxHandler on:ios) pingevery:100000;  optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX' 
> 
> 
> those are lightening fast. 
> 
> gonna run the full thing now. 
 
Unless you know that your input file only contains characters with 
codepoint < 128, you should use FileStream instead of StandardFileStream. 
The latter just returns the raw bytes as characters without decoding while 
the former does proper character conversion (like the Pharo code or 
FSReadStream). 
 
I suspect that the extra speed you reported in another email is just 
the side effect of skipping the character conversion. 
 
 
Levente
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211022/8973b739/attachment.html>


More information about the Squeak-dev mailing list