[squeak-dev] OSProcess streaming and early termination

Sat Dec 8 08:16:46 UTC 2012

On 08.12.2012, at 03:10, "David T. Lewis" <lewis at mail.msen.com> wrote:

> Hi Bert,
> 
> On Fri, Dec 07, 2012 at 04:54:31PM +0100, Bert Freudenberg wrote:
>> Hi David, folks,
>> 
>> I need to read output from an external command which is potentially too large to fit in memory. So I want to read from the pipe, and possibly have to terminate early.
>> 
>> Here is what I have so far - "od" is an example only of course, but I need to be able to use arg arrays and a working dir and it illustrates the problem:
>> 
>> | process1 process2 |
>> process1 := PipeableOSProcess new: '/usr/bin/od'
>>    arguments: {'-v'. '-t'. 'x1'. (Smalltalk imageName copyAfterLast: $/) asVmPathName}
>>    environment: nil descriptors: nil
>>    workingDir: Smalltalk imagePath asVmPathName
>>    errorPipelineStream: nil.
>> process2 := ExpressionEvaluator block: [:stdin | stdin next: 1000].
>> process2 pipeToInput: process1 pipeFromOutput.
>> process1 value.
>> process2 value.
>> process2 succeeded
>>    ifFalse: [process2 errorUpToEnd]
>>    ifTrue: [process2 output]
>> 
>> This does get me the first 1000 bytes od the "od" output. 
>> 
>> However, this seems like more hoops than necessary to jump through - I have to set up the processes first, then pipe them, then execute them, only then can I access the output. Finding the right sequence required reading a lot of code and guessing. Is there a more convenient way? I tried "|" but it only wants a string argument, not an ExpressionEvaluator object.
>> 
>> Secondly, even though the external process should be gone after reading 1000 chars, it appears that it is still running. Do I manually have to kill it? I tried #closePipes but that does an upToEnd which in this case is counterproductive because it churns through a Gigabyte of data.
>> 
> 
> I think this will work:
> 
>  cmd := 'od -v -t x1 ', (Smalltalk imageName copyAfterLast: $/) asVmPathName.
>  pipeline := ProxyPipeline command: cmd.
>  data := pipeline next: 1000.
>  pipeline closePipes.
>  data inspect

Okay, that looks a lot simpler. But with the string interface I have to worry about argument escaping. That's why I wanted to use the array interface, and avoid a shell. Constructing a sanitized string from user data is very hard, and made unnecessary bybusing a non-interpreted interface.

Also, as I wrote I need to be able to set the working directory (and potentially the environment). Your example only works accidentally because you launched squeak from the image directory.

> Your example exposed a problem in BufferedAsyncFileReadStream which was
> reading all available data from a stream regardless of whether anybody was
> consuming the data, so eventually the system gets a low memory warning. I added
> a check to prevent this, so please do another update your OSProcess from
> SqueakMap.

Ah, thanks. I thought that should have worked :)

> ProxyPipeline should have a better name. It made sense when I wrote it as a
> support class for CommandShell, but it turns out that nobody uses CommandShell
> and lots of people want to be able to evaluate a command line with some shell
> syntax support. So maybe it should be a CommandPipeline or a ShellCommandLine
> or something like that.
> 
> 
>> Thirdly, how do I find out about errors in the external process? E.g. if I misspell the command there is nothing in its stderr, it all seems to fail silently.
>> 
> 
> A PipeJunction has a pipeToInput, a pipeFromOutput, and an errorPipelineStream.
> A command pipeline works by connecting the pipeFromOutput (aka stdout) from one
> process proxy to the pipeToInput (aka stdin) of the next. The error output
> (aka stderr) of a proxy is accumulated in the shared errorPipelineStream.
> A ProxyPipeline behaves like a PipeJunction, with the stderr of all proxies
> accumulated in a shared errorPipelineStream.
> 
> The stderr output of a command pipeline is in the errorPipelineStream, and is
> accessed with #errorUpToEnd or #errorUpToEndOfFile.
> 
> Exit status of the external processes can be tested from the process proxies,
> and testing methods such as ProxyPipeline>>succeeded give overall status.
> 
> Thus:
> 
>  cmd := 'foo -v -t x1 ', (Smalltalk imageName copyAfterLast: $/) asVmPathName.
>  pipeline := ProxyPipeline command: cmd.
>  pipeline succeeded ==> false
>  pipeline first exitStatus ==> #fail
>  pipeline errorUpToEndOfFile 'sqsh: foo: command not found
> '

Makes sense. But what if I want to avoid a shell, how do I get a readable error?

- Bert -

> Dave
> 
>> Or maybe I'm going about this in a completely wrong way? I could not find an example anywhere in OSProcess that would pipe command output into Smalltalk code.
>> 
>> Help appreciated :)
>> 
>> - Bert -
>> 
>> 
>