Advice on using multiple processes on large jobs - Squeak-dev

8 Nov 2021


      Hi Folks,
I will be parsing a subset of a large XML file using the monty SAXParser, creating objects from the subsets and storing them in-image.
About a 2 hour job.
On those objects, I will be running different kind of parse using a PEG Grammar and the Xtreams-parsing package to turn the wikitext they contain into xHTML and storing that output (or the XMLDocument that contains it) on the object itself.
"Tabulating" a failure to correctly PEGParse an object is the goal here .  I am "automating" the process of capturing bugs in my PEG grammar.
Now, here is the question......
Should I run the two tasks sequentially? or in Parallel using separate processes?
The SAX process will probably run faster than the PEG process over the long run, so If I run in parallel, then I will put the SAX at a slightly lower priority then the PEG.
I am also considering Announcements to announce the SAX has output a new object and the PEG should start on it.
I find this idea attractive. Another option is separate images with the AMQP communication between them, but that is a bit more work.
Thoughts?
cordially,
t