<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta content="text/html;charset=UTF-8" http-equiv="Content-Type"></head><body ><div style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10pt;"><div>Hi Levente<br></div><div><br></div><div>I will test  FileStream next as the data is Unicode and could span multiple languages.<br></div><div><br></div><div>Here are the latest timeProfile stats with the StandardFileStream<br></div><div><br></div><div><br></div><div><blockquote style="border: 1px solid rgb(204, 204, 204); padding: 7px; background-color: rgb(245, 245, 245);"><div>  99.6% {4458781ms} [] UndefinedObject>>DoIt<br></div><div>                                                                                                                                                                99.6% {4458781ms} DocDemoSaxHandler(SAXHandler)>>parseDocument<br></div><div>                                                                                                                                                                  99.6% {4458781ms} XMLParser>>parseDocument<br></div><div>                                                                                                                                                                    99.6% {4458781ms} FullBlockClosure(BlockClosure)>>on:do:<br></div><div>                                                                                                                                                                      99.6% {4458781ms} [] XMLParser>>parseDocument<br></div><div>                                                                                                                                                                        99.0% {4434107ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextToken<br></div><div>                                                                                                                                                                          98.6% {4414400ms} XMLContentState>>nextTokenFrom:<br></div><div>                                                                                                                                                                            98.4% {4406601ms} XMLWellFormedParserTokenizer(XMLParserTokenizer)>>nextContentToken<br></div><div>                                                                                                                                                                              85.8% {3841387ms} XMLWellFormedParserTokenizer>>nextPCDataToken<br></div><div>                                                                                                                                                                                |23.8% {1067452ms} XMLNestedStreamReader>>next<br></div><div>                                                                                                                                                                                |17.0% {760985ms} XMLNestedStreamReader>>peek<br></div><div>                                                                                                                                                                                |  |9.1% {408603ms} StandardFileStream>>next<br></div><div>                                                                                                                                                                                |  |  |6.5% {290224ms} primitives<br></div><div>                                                                                                                                                                                |  |  |2.6% {118379ms} StandardFileStream>>basicNext<br></div><div>                                                                                                                                                                                |  |5.5% {246642ms} StandardFileStream>>atEnd<br></div><div>                                                                                                                                                                                |  |2.4% {105741ms} primitives<br></div><div>                                                                                                                                                                                |15.2% {680118ms} primitives<br></div><div>                                                                                                                                                                                |11.1% {497634ms} WriteStream>>nextPut:<br></div><div>                                                                                                                                                                                |8.1% {363401ms} XMLWellFormedParserTokenizer>>nextGeneralEntityOrCharacterReferenceOnCharacterStream<br></div><div>                                                                                                                                                                                |  |7.6% {341827ms} XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream<br></div><div>                                                                                                                                                                                |  |  4.0% {181168ms} Dictionary>>at:ifPresent:<br></div><div>                                                                                                                                                                                |  |    |2.5% {110467ms} Dictionary>>scanFor:<br></div><div>                                                                                                                                                                                |  |    |  |2.2% {96275ms} ByteString(String)>>=<br></div><div>                                                                                                                                                                                |  |    |  |  1.5% {68062ms} primitives<br></div><div>                                                                                                                                                                                |  |    |1.3% {56789ms} [] XMLWellFormedParserTokenizer>>nextGeneralEntityReferenceOnCharacterStream<br></div><div>                                                                                                                                                                                |  |  2.9% {130416ms} XMLWellFormedParserTokenizer>>nextEntityName<br></div><div>                                                                                                                                                                                |7.4% {330967ms} Character>>isXMLChar<br></div><div>                                                                                                                                                                                |1.1% {50607ms} SAXParserDriver>>handlePCData:<br></div><div>                                                                                                                                                                                |  1.0% {46659ms} primitives<br></div><div>                                                                                                                                                                              12.3% {549227ms} XMLWellFormedParserTokenizer>>nextContentMarkupToken<br></div><div>                                                                                                                                                                                11.7% {521727ms} XMLWellFormedParserTokenizer>>nextTag<br></div><div>                                                                                                                                                                                  4.8% {212861ms} XMLWellFormedParserTokenizer>>nextEndTag<br></div><div>                                                                                                                                                                                    |2.2% {99884ms} SAXParserDriver>>handleEndTag:<br></div><div>                                                                                                                                                                                    |  1.8% {80565ms} DocDemoSaxHandler>>endElement:prefix:uri:localName:<br></div><div>                                                                                                                                                                                    |    1.7% {77945ms} DocDemoSaxHandler>>ping<br></div><div>                                                                                                                                                                                    |      1.7% {77912ms} TranscriptStream>>show:<br></div><div>                                                                                                                                                                                    |        1.7% {77905ms} FullBlockClosure(BlockClosure)>>on:do:<br></div><div>                                                                                                                                                                                    |          1.7% {77905ms} [] TranscriptStream>>show:<br></div><div>                                                                                                                                                                                    |            1.7% {77901ms} TranscriptStream>>endEntry<br></div><div>                                                                                                                                                                                    |              1.7% {77901ms} Mutex>>critical:<br></div><div>                                                                                                                                                                                    |                1.7% {77899ms} FullBlockClosure(BlockClosure)>>ensure:<br></div></blockquote><br></div><div><br></div><div>much better.<br></div><div><br></div><div>FileStream timeProfile is running as I type this, should be done in a bit over an hour.<br></div><div><br></div><div>Cordially,<br></div><div><br></div><div>t</div><div><br></div><div><br></div><div><br></div><div><br></div><div class="zmail_extra_hr" style="border-top: 1px solid rgb(204, 204, 204); height: 0px; margin-top: 10px; margin-bottom: 10px; line-height: 0px;"><br></div><div class="zmail_extra" data-zbluepencil-ignore="true"><br><div id="Zm-_Id_-Sgn1">---- On Fri, 22 Oct 2021 03:43:28 -0400 <b>Levente Uzonyi <leves@caesar.elte.hu></b> wrote ----<br></div><br><blockquote style="margin: 0px;"><div>Hi Tim, <br> <br>On Thu, 21 Oct 2021, gettimothy wrote: <br> <br>> <br>> Hey! <br>> <br>> <br>> This appears to work now,  <br>> <br>> <br>> <br>> <br>>       ping: zero elements.  Time: 0:00:00:10.941282 <br>> 8717587920 <br>> ping: one hundred thousand elements.  Time: 0:00:00:21.888107 <br>> 8787084464 <br>> <br>> <br>> this is on StandardFileStream... <br>> <br>>       |ios| <br>> Transcript clear. <br>> ios := (StandardFileStream readOnlyFileNamed:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' )). <br>> [(DocDemoSaxHandler on:ios) pingevery:100000;  optimizeForLargeDocuments;parseDocument] forkAt: Processor userBackgroundPriority named:'SAX' <br>> <br>> <br>> those are lightening fast. <br>> <br>> gonna run the full thing now. <br> <br>Unless you know that your input file only contains characters with <br>codepoint < 128, you should use FileStream instead of StandardFileStream. <br>The latter just returns the raw bytes as characters without decoding while <br>the former does proper character conversion (like the Pharo code or <br>FSReadStream). <br> <br>I suspect that the extra speed you reported in another email is just <br>the side effect of skipping the character conversion. <br> <br> <br>Levente</div></blockquote></div><div><br></div></div><br></body></html>