[squeak-dev] Can I create a 75 Gb Image? (about those post-build stats)
gettimothy
gettimothy at zoho.com
Fri Oct 22 11:15:25 UTC 2021
Hi Levente
I will test FileStream next as the data is Unicode and could span multiple languages.
Here are the latest timeProfile stats with the StandardFileStream
99.6% {4458781ms} [] UndefinedObject>>DoIt
99.6% {4458781ms} DocDemoSaxHandler(SAXHandler)>>parseDocument
99.6% {4458781ms} XMLParser>>parseDocument
99.6% {4458781ms} FullBlockClosure(BlockClosure)>>on:do:
99.6% {4458781ms} [] XMLParser>>parseDocument
99.0% {4434107ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextToken
98.6% {4414400ms} XMLContentState>>nextTokenFrom:
98.4% {4406601ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextContentToken
85.8% {3841387ms} XMLWellFormedParserTokenizer>>nextPCDataToken
|23.8% {1067452ms} XMLNestedStreamReader>>next
|17.0% {760985ms} XMLNestedStreamReader>>peek
| |9.1% {408603ms} StandardFileStream>>next
| | |6.5% {290224ms} primitives
| | |2.6% {118379ms} StandardFileStream>>basicNext
| |5.5% {246642ms} StandardFileStream>>atEnd
| |2.4% {105741ms} primitives
|15.2% {680118ms} primitives
|11.1% {497634ms} WriteStream>>nextPut:
|8.1% {363401ms} XMLWellFormedParserTokenizer>>nextGeneralEntityOrCharacterReferenceOnCharacterStream
| |7.6% {341827ms} XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream
| | 4.0% {181168ms} Dictionary>>at:ifPresent:
| | |2.5% {110467ms} Dictionary>>scanFor:
| | | |2.2% {96275ms} ByteString(String)>>=
| | | | 1.5% {68062ms} primitives
| | |1.3% {56789ms} [] XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream
| | 2.9% {130416ms} XMLWellFormedParserTokenizer>>nextEntityName
|7.4% {330967ms} Character>>isXMLChar
|1.1% {50607ms} SAXParserDriver>>handlePCData:
| 1.0% {46659ms} primitives
12.3% {549227ms} XMLWellFormedParserTokenizer>>nextContentMarkupToken
11.7% {521727ms} XMLWellFormedParserTokenizer>>nextTag
4.8% {212861ms} XMLWellFormedParserTokenizer>>nextEndTag
|2.2% {99884ms} SAXParserDriver>>handleEndTag:
| 1.8% {80565ms} DocDemoSaxHandler>>endElement:prefix:uri:localName:
| 1.7% {77945ms} DocDemoSaxHandler>>ping
| 1.7% {77912ms} TranscriptStream>>show:
| 1.7% {77905ms} FullBlockClosure(BlockClosure)>>on:do:
| 1.7% {77905ms} [] TranscriptStream>>show:
| 1.7% {77901ms} TranscriptStream>>endEntry
| 1.7% {77901ms} Mutex>>critical:
| 1.7% {77899ms} FullBlockClosure(BlockClosure)>>ensure:
much better.
FileStream timeProfile is running as I type this, should be done in a bit over an hour.
Cordially,
t
---- On Fri, 22 Oct 2021 03:43:28 -0400 Levente Uzonyi <leves at caesar.elte.hu> wrote ----
Hi Tim,
On Thu, 21 Oct 2021, gettimothy wrote:
>
> Hey!
>
>
> This appears to work now,
>
>
>
>
> ping: zero elements. Time: 0:00:00:10.941282
> 8717587920
> ping: one hundred thousand elements. Time: 0:00:00:21.888107
> 8787084464
>
>
> this is on StandardFileStream...
>
> |ios|
> Transcript clear.
> ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )).
> [(DocDemoSaxHandler on:ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX'
>
>
> those are lightening fast.
>
> gonna run the full thing now.
Unless you know that your input file only contains characters with
codepoint < 128, you should use FileStream instead of StandardFileStream.
The latter just returns the raw bytes as characters without decoding while
the former does proper character conversion (like the Pharo code or
FSReadStream).
I suspect that the extra speed you reported in another email is just
the side effect of skipping the character conversion.
Levente
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211022/8973b739/attachment.html>
More information about the Squeak-dev
mailing list
|