[squeak-dev] PrimitiveFailed: #growMemoryByAtLeast: in SmalltalkImage failed.
gettimothy at zoho.com
Sat Oct 16 19:16:23 UTC 2021
Thank you for your reply.
Looks like it finished.
ping: 332000000 elements. Time: 0:02:12:06.877788
332000000 asWords 'three hundred thirty-two million'
Those are XMLElements, not documents per-se. Now that I have a rough guestimate of the time involved, I will start tweaking to collect meta-data stuff.
" how much memory do you have on your machine?"
bash-4.3$ grep MemTotal /proc/meminfo
MemTotal: 32814924 kB
As far as how big the image footprint is....let's seee....
20,978,056 bytes (internal)
26,689,136 bytes (physical)
26,689,136 bytes (total)
An interesting phenomena after the crash happened.
I killed the process, then saved the image.
That save took a long time compared to the original image.
I presume that the image grew.
Another presumption/suspicion is that the Image tried to grow as the application needed it, but then fell behind and caused the process to barf.
But this is just a guess.
"I would be trying to parse it in Squeak. Fixing the FileSystem incompatibilities shouldn’t be as difficult as trying to exchange data between images and is very useful)."
I will attempt that. A quick heads up, though.
I had to change from pharo9 to pharo8 as the jump from 8 to 9 is introducing filesystem incompatablities within pharo (:
Also, I presume that means modifying https://github.com/squeak-smalltalk/squeak-filesystem is this true?
Or do you have another suggestion?
Hi, how much memory do you have on your machine? If you parse a smaller, but still substantial, xml file with the same schema, what is the ratio between the document size and its in image version? ie to know if you can process the full 73Gb you need a good estimate of how big the in image footprint is.
I would be trying to parse it in Squeak. Fixing the FileSystem incompatibilities shouldn’t be as difficult as trying to exchange data between images and is very useful).
On Oct 16, 2021, at 3:50 AM, gettimothy via Squeak-dev <mailto:squeak-dev at lists.squeakfoundation.org> wrote:
Because of FileSystem incompats, I am attempting a SAXParser parse of a 73Gb file by running this on pharo:
t := [[(DocDemoSaxHandler on:('/bulkstorage/enwiki-20200501-pages-articles-multistream.xml' asFileReference)) optimizeForLargeDocuments;parseDocument] forkAt:Processor lowIOPriority named:'SAX'] timeToRun.
I am posting here as I am betting that this sort of thing can be common to any platform.
My goal is to see how long this parse will take.
I do not need the data in-image.
During the SAXParse, when I hit a certain Element (or two) , I will be taking those element contents and sending them via network connection to a PEGParser running on Squeak with XTreams.
So, on the SAXParser side, I just need...
stream from 1 to X
send a portion of 1 to X Squeak on another image.
dispose of 1 to X.
Any pointers on how to approach this sort of problem are greatly appreciated.
Linux has the concept of routing to /dev/null to make stuff disappear.
I have never seen that concept in Smalltalk.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Squeak-dev