I can get the data, thank for the pointer.
|ios path filename|
Transcript clear.
path := '/home/wm/usr/src/smalltalk/XML/' .
filename := 'bookstore.xml.bz2'.
ios := UnixProcess bzcatAFileToPipe: path filename: filename.
(DocDemoSaxHandler on: ios) debug: true; pingevery:1000; optimizeForLargeDocuments;parseDocument.
where the stream that the SaxHandler depends on comes from David's work:
bzcatAFileToPipe: pathString filename:filestring
"Pipe bzcat output to some AttachableFileStream...whatever the heck that is...."
"UnixProcess bzcatAFile"
| filename in pipe2 output dest child path |
path := pathString.
filename := filestring.
in := OSProcess readOnlyFileNamed: path, filename.
pipe2 := OSPipe nonBlockingPipe.
output := pipe2 writer.
dest := pipe2 reader.
child := UnixProcess
forkJob: '/bin/bzcat'
arguments: nil
environment: nil
descriptors: (Array with: in with: output with: nil).
in close.
(Delay forSeconds: 1) wait.
child sigterm.
^ dest "be sure to close it on inspection"
Its a great tool. We do not have to write a tar file reader or a bzip reader....we can work directly with some great existing tools.
thanks again.
If I inspect the ios (AttachableFileStream ?) and do a Transcript show: (self next:10000000) where the 10000000 is much bigger than the contents the stream has, the squeak system freezes.
Similarly, when I run the
(DocDemoSaxHandler on: ios) debug: true; pingevery:1000; optimizeForLargeDocuments;parseDocument.
The document data prints out nicely, but squeak goes into a tight loop /freeze.
thanks to all again, very, very helpful.
tty
---- On Mon, 10 Jul 2023 18:00:53 -0400 Eliot Miranda eliot.miranda@gmail.com wrote ---
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios|
ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty