[squeak-dev] OSProcess streaming and early termination

Sat Dec 8 02:10:13 UTC 2012

Hi Bert,

On Fri, Dec 07, 2012 at 04:54:31PM +0100, Bert Freudenberg wrote:
> Hi David, folks,
> 
> I need to read output from an external command which is potentially too large to fit in memory. So I want to read from the pipe, and possibly have to terminate early.
> 
> Here is what I have so far - "od" is an example only of course, but I need to be able to use arg arrays and a working dir and it illustrates the problem:
> 
> | process1 process2 |
> process1 := PipeableOSProcess new: '/usr/bin/od'
> 	arguments: {'-v'. '-t'. 'x1'. (Smalltalk imageName copyAfterLast: $/) asVmPathName}
> 	environment: nil descriptors: nil
> 	workingDir: Smalltalk imagePath asVmPathName
> 	errorPipelineStream: nil.
> process2 := ExpressionEvaluator block: [:stdin | stdin next: 1000].
> process2 pipeToInput: process1 pipeFromOutput.
> process1 value.
> process2 value.
> process2 succeeded
> 	ifFalse: [process2 errorUpToEnd]
> 	ifTrue: [process2 output]
> 
> This does get me the first 1000 bytes od the "od" output. 
> 
> However, this seems like more hoops than necessary to jump through - I have to set up the processes first, then pipe them, then execute them, only then can I access the output. Finding the right sequence required reading a lot of code and guessing. Is there a more convenient way? I tried "|" but it only wants a string argument, not an ExpressionEvaluator object.
> 
> Secondly, even though the external process should be gone after reading 1000 chars, it appears that it is still running. Do I manually have to kill it? I tried #closePipes but that does an upToEnd which in this case is counterproductive because it churns through a Gigabyte of data.
> 

I think this will work:

  cmd := 'od -v -t x1 ', (Smalltalk imageName copyAfterLast: $/) asVmPathName.
  pipeline := ProxyPipeline command: cmd.
  data := pipeline next: 1000.
  pipeline closePipes.
  data inspect

Your example exposed a problem in BufferedAsyncFileReadStream which was
reading all available data from a stream regardless of whether anybody was
consuming the data, so eventually the system gets a low memory warning. I added
a check to prevent this, so please do another update your OSProcess from
SqueakMap.

ProxyPipeline should have a better name. It made sense when I wrote it as a
support class for CommandShell, but it turns out that nobody uses CommandShell
and lots of people want to be able to evaluate a command line with some shell
syntax support. So maybe it should be a CommandPipeline or a ShellCommandLine
or something like that.

> Thirdly, how do I find out about errors in the external process? E.g. if I misspell the command there is nothing in its stderr, it all seems to fail silently.
> 

A PipeJunction has a pipeToInput, a pipeFromOutput, and an errorPipelineStream.
A command pipeline works by connecting the pipeFromOutput (aka stdout) from one
process proxy to the pipeToInput (aka stdin) of the next. The error output
(aka stderr) of a proxy is accumulated in the shared errorPipelineStream.
A ProxyPipeline behaves like a PipeJunction, with the stderr of all proxies
accumulated in a shared errorPipelineStream.

The stderr output of a command pipeline is in the errorPipelineStream, and is
accessed with #errorUpToEnd or #errorUpToEndOfFile.

Exit status of the external processes can be tested from the process proxies,
and testing methods such as ProxyPipeline>>succeeded give overall status.

Thus:

  cmd := 'foo -v -t x1 ', (Smalltalk imageName copyAfterLast: $/) asVmPathName.
  pipeline := ProxyPipeline command: cmd.
  pipeline succeeded ==> false
  pipeline first exitStatus ==> #fail
  pipeline errorUpToEndOfFile 'sqsh: foo: command not found
'

Dave

> Or maybe I'm going about this in a completely wrong way? I could not find an example anywhere in OSProcess that would pipe command output into Smalltalk code.
> 
> Help appreciated :)
> 
> - Bert -
> 
>