[squeak-dev] OSProcess streaming and early termination

Sat Dec 8 14:56:20 UTC 2012

Hmmm, I will need to think about this some more. I'm not sure that I
have a good answer.

Dave

On Sat, Dec 08, 2012 at 09:16:46AM +0100, Bert Freudenberg wrote:
> On 08.12.2012, at 03:10, "David T. Lewis" <lewis at mail.msen.com> wrote:
> 
> > Hi Bert,
> > 
> > On Fri, Dec 07, 2012 at 04:54:31PM +0100, Bert Freudenberg wrote:
> >> Hi David, folks,
> >> 
> >> I need to read output from an external command which is potentially too large to fit in memory. So I want to read from the pipe, and possibly have to terminate early.
> >> 
> >> Here is what I have so far - "od" is an example only of course, but I need to be able to use arg arrays and a working dir and it illustrates the problem:
> >> 
> >> | process1 process2 |
> >> process1 := PipeableOSProcess new: '/usr/bin/od'
> >>    arguments: {'-v'. '-t'. 'x1'. (Smalltalk imageName copyAfterLast: $/) asVmPathName}
> >>    environment: nil descriptors: nil
> >>    workingDir: Smalltalk imagePath asVmPathName
> >>    errorPipelineStream: nil.
> >> process2 := ExpressionEvaluator block: [:stdin | stdin next: 1000].
> >> process2 pipeToInput: process1 pipeFromOutput.
> >> process1 value.
> >> process2 value.
> >> process2 succeeded
> >>    ifFalse: [process2 errorUpToEnd]
> >>    ifTrue: [process2 output]
> >> 
> >> This does get me the first 1000 bytes od the "od" output. 
> >> 
> >> However, this seems like more hoops than necessary to jump through - I have to set up the processes first, then pipe them, then execute them, only then can I access the output. Finding the right sequence required reading a lot of code and guessing. Is there a more convenient way? I tried "|" but it only wants a string argument, not an ExpressionEvaluator object.
> >> 
> >> Secondly, even though the external process should be gone after reading 1000 chars, it appears that it is still running. Do I manually have to kill it? I tried #closePipes but that does an upToEnd which in this case is counterproductive because it churns through a Gigabyte of data.
> >> 
> > 
> > I think this will work:
> > 
> >  cmd := 'od -v -t x1 ', (Smalltalk imageName copyAfterLast: $/) asVmPathName.
> >  pipeline := ProxyPipeline command: cmd.
> >  data := pipeline next: 1000.
> >  pipeline closePipes.
> >  data inspect
> 
> Okay, that looks a lot simpler. But with the string interface I have to worry about argument escaping. That's why I wanted to use the array interface, and avoid a shell. Constructing a sanitized string from user data is very hard, and made unnecessary bybusing a non-interpreted interface.
> 
> Also, as I wrote I need to be able to set the working directory (and potentially the environment). Your example only works accidentally because you launched squeak from the image directory.
> 
> > Your example exposed a problem in BufferedAsyncFileReadStream which was
> > reading all available data from a stream regardless of whether anybody was
> > consuming the data, so eventually the system gets a low memory warning. I added
> > a check to prevent this, so please do another update your OSProcess from
> > SqueakMap.
> 
> Ah, thanks. I thought that should have worked :)
> 
> > ProxyPipeline should have a better name. It made sense when I wrote it as a
> > support class for CommandShell, but it turns out that nobody uses CommandShell
> > and lots of people want to be able to evaluate a command line with some shell
> > syntax support. So maybe it should be a CommandPipeline or a ShellCommandLine
> > or something like that.
> > 
> > 
> >> Thirdly, how do I find out about errors in the external process? E.g. if I misspell the command there is nothing in its stderr, it all seems to fail silently.
> >> 
> > 
> > A PipeJunction has a pipeToInput, a pipeFromOutput, and an errorPipelineStream.
> > A command pipeline works by connecting the pipeFromOutput (aka stdout) from one
> > process proxy to the pipeToInput (aka stdin) of the next. The error output
> > (aka stderr) of a proxy is accumulated in the shared errorPipelineStream.
> > A ProxyPipeline behaves like a PipeJunction, with the stderr of all proxies
> > accumulated in a shared errorPipelineStream.
> > 
> > The stderr output of a command pipeline is in the errorPipelineStream, and is
> > accessed with #errorUpToEnd or #errorUpToEndOfFile.
> > 
> > Exit status of the external processes can be tested from the process proxies,
> > and testing methods such as ProxyPipeline>>succeeded give overall status.
> > 
> > Thus:
> > 
> >  cmd := 'foo -v -t x1 ', (Smalltalk imageName copyAfterLast: $/) asVmPathName.
> >  pipeline := ProxyPipeline command: cmd.
> >  pipeline succeeded ==> false
> >  pipeline first exitStatus ==> #fail
> >  pipeline errorUpToEndOfFile 'sqsh: foo: command not found
> > '
> 
> Makes sense. But what if I want to avoid a shell, how do I get a readable error?
> 
> - Bert -
> 
> > Dave
> > 
> >> Or maybe I'm going about this in a completely wrong way? I could not find an example anywhere in OSProcess that would pipe command output into Smalltalk code.
> >> 
> >> Help appreciated :)
> >> 
> >> - Bert -
> >> 
> >> 
> >