[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)

gettimothy gettimothy at zoho.com
Thu Oct 21 12:36:16 UTC 2021


Thx Levente.





Should I attempt to fix this? How should it be approached? 



I have only a dim idea what "read buffering is" (file access is slow, so get a lot of data, at a certain threshold, asynchonously refill the buffer?).



Is there an existing Stream that implemts it?



Should I take the guts of that and put it in FSReadStream? 





Thank you for your time.




Below are the relevant sections of the squeak and pharo runs:



squeak:(~ 1 Million elements in ~1 hour.)



                                                                                                                                                                              98.2% {2363012ms} XMLWellFormedParserTokenizer>>nextPCDataToken

                                                                                                                                                                                |67.6% {1626652ms} XMLNestedStreamReader>>peek

                                                                                                                                                                                |  |67.6% {1626281ms} FSReadStream>>next

                                                                                                                                                                                |  |  66.4% {1597152ms} primitives

                                                                                                                                                                                |  |  1.2% {28325ms} UTF8TextConverter>>nextFromStream:

                                                                                                                                                                                |26.5% {637549ms} XMLNestedStreamReader>>next

                                                                                                                                                                                |3.6% {87248ms} XMLWellFormedParserTokenizer>>nextGeneralEntityOrCharacterReferenceOnCharacterStream

                                                                                                                                                                                |  3.0% {72138ms} XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream

                                                                                                                                                                                |    2.5% {59508ms} XMLWellFormedParserTokenizer>>nextEntityName

                                                                                                                                                                                |      1.8% {42464ms} XMLNestedStreamReader>>peek

                                                                                                                                                                                |        1.8% {42307ms} FSReadStream>>next

                                                                                                                                                                                |          1.7% {41744ms} primitives

                                                                                                                                                                              1.1% {27156ms} XMLWellFormedParserTokenizer>>nextContentMarkupToken


pharo (ping: three hundred thirty-two million elements.  Time: 0:02:04:23.88027)


:



         98.4% {7344009ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextContentToken

            88.3% {6593766ms} XMLWellFormedParserTokenizer>>nextPCDataToken

              |33.6% {2507147ms} XMLNestedStreamReader>>peek

              |  |30.9% {2305352ms} ZnCharacterReadStream(ZnEncodedReadStream)>>next

              |  |  |27.8% {2072836ms} ZnCharacterReadStream>>nextElement

              |  |  |  |25.8% {1929523ms} ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:

              |  |  |  |  |22.9% {1706511ms} ZnUTF8Encoder>>nextCodePointFromStream:

              |  |  |  |  |  |21.6% {1610840ms} ZnBufferedReadStream>>next

              |  |  |  |  |  |  |21.6% {1610834ms} primitives

              |  |  |  |  |  |1.2% {86079ms} primitives

              |  |  |  |  |3.0% {223012ms} primitives

              |  |  |  |1.0% {77498ms} primitives

              |  |  |1.6% {122458ms} primitives

              |  |  |1.5% {110057ms} ZnBufferedReadStream>>atEnd

              |  |  |  1.5% {110055ms} primitives

              |  |2.2% {163569ms} ZnCharacterReadStream(ZnEncodedReadStream)>>atEnd

              |  |  1.2% {92568ms} primitives

              |16.2% {1207262ms} XMLNestedStreamReader>>next

              |12.7% {949179ms} WriteStream>>nextPut:

              |  |12.7% {949177ms} primitives

              |8.4% {627634ms} Character>>isXMLChar

              |6.6% {489916ms} XMLWellFormedParserTokenizer>>nextGeneralEntityOrCharacterReferenceOnCharacterStream

              |  |6.1% {452072ms} XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream

              |  |  2.9% {216938ms} Dictionary>>at:ifPresent:

              |  |    |1.3% {96858ms} Dictionary(HashedCollection)>>findElementOrNil:

              |  |    |  |1.3% {95924ms} Dictionary>>scanFor:

              |  |    |1.2% {90639ms} BlockClosure>>cull:

              |  |  2.6% {190530ms} XMLWellFormedParserTokenizer>>nextEntityName

              |  |    1.1% {79370ms} XMLNestedStreamReader>>peek

              |6.2% {462153ms} WriteStream>>contents

              |  |5.9% {441631ms} WideString>>copyFrom:to:

              |  |  3.8% {285489ms} WideString(String)>>isOctetString

              |  |  1.9% {144904ms} WideString(String)>>asOctetString

              |  |    1.9% {139761ms} primitives

              |4.0% {301249ms} primitives

            9.3% {695725ms} XMLWellFormedParserTokenizer>>nextContentMarkupToken

              8.3% {621663ms} XMLWellFormedParserTokenizer>>nextTag

                2.6% {197358ms} XMLWellFormedParserTokenizer>>nextEndTag

                  |1.2% {91522ms} XMLNestedStreamReader>>next

                2.2% {160759ms} XMLWellFormedParserTokenizer>>nextElementName








---- On Thu, 21 Oct 2021 04:19:11 -0400 Levente Uzonyi <leves at caesar.elte.hu> wrote ----


Hi Tim, 
 
On Wed, 20 Oct 2021, gettimothy via Squeak-dev wrote: 
 
> First, thanks to all for the advice. 
> 
> I parsed 1 Million elements, if you need more, let me know. 
> It takes about 11 hours to parse 20 million elements (out of 300million+). 
 
Sounds really slow. 
 
> Levente: Regarding #timeProfile is my friend. 
> 
> I am not sure how to read this, but it may be that "peek" is the hog here. 
 
It looks as if FSReadStream does not implement read buffering, so 
just reading the file doing anything with the its content takes ages. 
 
 
Levente
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211021/6cc4f427/attachment.html>


More information about the Squeak-dev mailing list