[squeak-dev] Advice on using multiple processes on large jobs

gettimothy gettimothy at zoho.com
Mon Nov 8 14:40:22 UTC 2021

Hi Folks,

I will be parsing a subset of a large XML file using the monty SAXParser, creating objects from the subsets and storing them in-image.

About a 2 hour job.

On those objects, I will be running different kind of parse using a PEG Grammar and the Xtreams-parsing package to turn the wikitext they contain into xHTML and storing that output (or the XMLDocument that contains it) on the object itself.
"Tabulating" a failure to correctly PEGParse an object is the goal here .  I am "automating" the process of capturing bugs in my PEG grammar.

Now, here is the question......

Should I run the two tasks sequentially? or in Parallel using separate processes?

The SAX process will probably run faster than the PEG process over the long run, so If I run in parallel, then I will put the SAX at a slightly lower priority then the PEG.

I am also considering Announcements to announce the SAX has output a new object and the PEG should start on it.

I find this idea attractive. Another option is separate images with the AMQP communication between them, but that is a bit more work.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211108/5159e3d2/attachment.html>

More information about the Squeak-dev mailing list