FilePlugin currently splits files in to two groups: 1) Stdio streams and 2) everything else.
To test for the end of file, FilePlugin>>primitiveFileAtEnd:
1) Uses feof() for stdio streams. 2) Compares the current position to the file size for everything else.
This returns the expected results for regular files, but fails for non-regular files, e.g.:
(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
returns an empty array.
On Unix, the proper way to check is to read past the end of the file and then call feof() to confirm that it is in fact at the end.
Pharo has plenty of code that assumes that a file position >= the file size means that the file is at the end, and as stated above this generally works for regular files.
This patch modifies FilePlugin>>primitiveFileAtEnd to:
a) Keep the current behaviour of using the file position test for regular files. b) Keep the current behaviour of using feof() for stdio streams. c) Use feof() for non-regular files, e.g. /dev/urandom.
This allows existing code to continue to function, and allows non-regular files to be tested correctly.
After applying the patch, the example code above answers the expected result, e.g.:
(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8. " #[179 136 227 226 28 147 197 125]"
On Windows, as far as I can tell, all files are regular, and the position test is used regularly, so no change is requried.
Fogbugz: https://pharo.fogbugz.com/f/cases/21643/ You can view, comment on, or merge this pull request online at:
https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232
-- Commit Summary --
* FilePlugin>>primitiveFileAtEnd for non-regular files
-- File Changes --
M platforms/Cross/plugins/FilePlugin/FilePlugin.h (2) M platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c (30)
-- Patch Links --
https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232.patch https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232.diff
The discussion on vm-dev shows that this tries to solve a problem which rather is at image side in latest Pharo. See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027341.html and that we should aim at simplifying the implementation rather than complexifying See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027343.html
About the simplification, wasn't there another effort to remove the cached file size? And excuse the naive question, but why feof() would not work for all cases?
On Sat, Mar 31, 2018 at 09:01:49AM -0700, Nicolas Cellier wrote:
The discussion on vm-dev shows that this tries to solve a problem which rather is at image side in latest Pharo. See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027341.html and that we should aim at simplifying the implementation rather than complexifying See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027343.html
About the simplification, wasn't there another effort to remove the cached file size? And excuse the naive question, but why feof() would not work for all cases?
It is not naive, and it is definitely not obvious.
The answer I would give is that the feof() test has a different meaning from "at end of file". The feof() test determines if an end of file flag was set in some previous operation. If the flag is set, then you know that you are at the end of file. But if it is not set, you could still be at the end of file position, in which case the end of file flag will be set the next time you attempt to do a read.
Thus if we want atEnd to mean "at the end now, and the next read will produce an error", then the feof() cannot answer that.
Dave
Hi Nicolas,
There are actually 3 or 4 issues (depending on what you want to include at the moment):
1. ZnBufferedReadStream was doing the wrong thing (the image side issue you mentioned). That has been fixed.
2. #atEnd returns the wrong information for non-regular files on Unix. That's what this PR will fix. http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027342.html demonstrates the issue.
3. Stdio streams are treated as a special case on Unix. Now that we can open streams by file descriptor we're almost at the point where we can remove special handling for stdio streams on Unix. I haven't gone through all the code to be sure though. That would obviously simplify things.
4. Stdio streams are treated as a special case on Windows. I haven't looked at this at all.
And excuse the naive question, but why feof() would not work for all cases?
It might, but currently there are tests in the Pharo automated test suite which explicitly check that #atEnd returns true when the stream has returned the last character (but not past it). I don't know how important these tests are.
Cheers, Alistair
Hi Both, thanks for the feof() explanation, I should have known this. So, it shows that we are not on the right track. The right way to do it on Unix is to try to read and check if atEnd in post-condition rather than trying to test atEnd in pre-condition. This is exactly as suggested by Levente. Or even better, get some end-of-file error condition directly in response to read primitive, and handle that gracefully at image side.
IMO primitiveFileAtEnd should rather be deprecated.
2018-03-31 18:35 GMT+02:00 OpenSmalltalk-Bot notifications@github.com:
On Sat, Mar 31, 2018 at 09:01:49AM -0700, Nicolas Cellier wrote:
The discussion on vm-dev shows that this tries to solve a problem which
rather is at image side in latest Pharo.
See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-
March/027341.html
and that we should aim at simplifying the implementation rather than
complexifying
See http://lists.squeakfoundation.org/pipermail/vm-dev/2018-
March/027343.html
About the simplification, wasn't there another effort to remove the
cached file size?
And excuse the naive question, but why feof() would not work for all
cases?
It is not naive, and it is definitely not obvious.
The answer I would give is that the feof() test has a different meaning from "at end of file". The feof() test determines if an end of file flag was set in some previous operation. If the flag is set, then you know that you are at the end of file. But if it is not set, you could still be at the end of file position, in which case the end of file flag will be set the next time you attempt to do a read.
Thus if we want atEnd to mean "at the end now, and the next read will produce an error", then the feof() cannot answer that.
Dave
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232#issuecomment-377705650, or mute the thread https://github.com/notifications/unsubscribe-auth/AAscIrim1_6lcCunTBfwxNA7jtNToY5Kks5tj7BugaJpZM4S6rMa .
For example in Squeak, we can rewrite atEnd like this:
StandardFileStream>>atEnd "Answer whether the receiver is at its end. " collection ifNotNil: [ position < readLimit ifTrue: [ ^false ] ]. self basicNext ifNil: [ ^true ]. self skip: -1. ^false
then entirely remove `primAtEnd:` which has no more sender. I presume it should be more or less equivalent in Pharo. Then tell about limitation and deprecation in `primitiveFileAtEnd` comment.
Hi Nicolas,
So, it shows that we are not on the right track. The right way to do it on Unix is to try to read and check if atEnd in post-condition rather than trying to test atEnd in pre-condition.
But that is exactly what this PR is about - it provides the ability for a post-condition check to be performed.
This is exactly as suggested by Levente. Or even better, get some end-of-file error condition directly in response to read primitive, and handle that gracefully at image side.
The correct usage, as you say above, is to read until there is no more data, and then check to see if the end of file has been reached. Without this (modified) primitive there's no way to perform that check.
We could change the image to handle no-more-data and end-of-file differently, but it would still require the check to be made, and I wouldn't combine the read and eof checks in to one primitive.
IMO primitiveFileAtEnd should rather be deprecated.
Based on my understanding, I disagree. Please let me know if I'm missing something.
Cheers, Alistair
For example in Squeak, we can rewrite atEnd like this: ...
I think this will return incorrect results. No-more-data is not the same is end-of-file. No-more-data just says that there isn't any data available now, but that may change.
Don't forget we are talking about non-regular files, e.g. /dev/urandom, that isn't a real stream, it is calculated and may be waiting on external resources.
I'll never learn to not reply when I'm in a hurry...
Don't forget we are talking about non-regular files, e.g. /dev/urandom, that isn't a real stream, it is calculated and may be waiting on external resources.
It is, of course, a real stream. Just not a regular file.
Hi Alistair, OK, I see your point, my proposal would work for /dev/random but maybe not all possible kind of stream But then primitiveFileAtEnd is trying to be too smart: - being able to test atEnd in pre-condition for regular files - being able to test atEnd in post-condition for the rest of the world (pipe socket special devices etc...). It's kind of brainfuck. I say we should simplify and keep only one, post-condition with a simple feof and leave smartness at image side. What if we add
primitiveEncounteredEndOfFile "Answer true if the end of file was encountered at last read operation".
And leave current primitive only for backward compatibility with deprecation warning in the comment?
On Sun, Apr 01, 2018 at 12:26:56PM -0700, Nicolas Cellier wrote:
Hi Alistair, OK, I see your point, my proposal would work for /dev/random but maybe not all possible kind of stream But then primitiveFileAtEnd is trying to be too smart:
- being able to test atEnd in pre-condition for regular files
- being able to test atEnd in post-condition for the rest of the world (pipe socket special devices etc...).
It's kind of brainfuck. I say we should simplify and keep only one, post-condition with a simple feof and leave smartness at image side. What if we add
primitiveEncounteredEndOfFile "Answer true if the end of file was encountered at last read operation".
And leave current primitive only for backward compatibility with deprecation warning in the comment?
As a point of reference, in OSProcess it is implemented as StandardFileStream>>atEndOfFile as an extension in *OSProcess-Base, as distinct from StandardFileStream>>atEnd. The EOF test is done in the OSProcessPlugin (not FilePlugin), and is implemented as #primitiveTestEndOfFileFlag.
The most important use case is for a file stream on an OS pipe (e.g. OSPipe new reader).
Similar to /dev/random, the stream on an OS pipe is represented as a kind of PositionableStream, but it is a positionable stream that does not know its position, and for which the position is not settable.
I do not know of a perfect solution to this, but if you want to experiment on the image side with the different semantics of "at end now" versus "last operation hit the EOF condition so do not bother asking for more" then the primitiveTestEndOfFileFlag that is present in any VM with the UnixOSProcessPlugin will probably give you what is needed to try out some new ideas.
Dave
Hi David, I will have a look. We still have ungetc to just skip: -1 even on a pipe, as long as called only once. This could be used to test atEndNow in precondition, Am I wrong?
Hi Nicolas and Dave,
We still have ungetc to just skip: -1 even on a pipe, as long as called only once. This could be used to test atEndNow in precondition, Am I wrong?
You're correct (as I understand it). I think it is more a matter of what do we want to try and move to in the long term.
Right now the behaviour of #atEnd is:
- True after moving past the end for stdio files - True when at the end for regular (disk) files - Broken for non-regular files
So we already have both behaviours.
My natural inclination is to minimise the primitive code and let the image handle it. In this case, that means keeping the feof() behaviour and eventually changing the definition for regular files.
The primitive should also be checking ferror(), so should be something like (untested):
sqInt sqFileAtEnd(SQFile *f) { sqInt status; /* Return true if the file's read/write head is at the end of the file. */
if (!sqFileValid(f)) return interpreterProxy->success(false); pentry(sqFileAtEnd); /* Regular files test based on current position vs size, * all other files use feof() */ if (f->isStdioStream || !S_ISREG(f->st_mode)) status = feof(getFile(f)) else status = ftell(getFile(f)) >= getSize(f); if (ferror(getFile(f))) primitiveFailForOSError(errno); return pexit(status); }
Cheers, Alistair
Alistair, you are perfectly right. The dual pre/post condition did already exist, and you are proposing an improvement. My only concern was about complexifying a piece that we will keep forever for compatibility reasons, while we may better maje deeper changes at the image side. But 'the best is the enemy of the good', if you think that this change is vital for Pharo, i have no further objection.
Hi Nicolas,
Thanks. I agree that this isn't simplifying the existing code, although I also think it isn't really adding complexity. It does result in an arguably correct response where before it was clearly incorrect.
Just for the record, the same incorrect behaviour currently exists in Squeak (#atEnd returns the wrong result for a non-regular file).
Cheers, Alistair
Hi Alistair,
On Apr 3, 2018, at 4:19 AM, akgrant43 notifications@github.com wrote:
Hi Nicolas,
Thanks. I agree that this isn't simplifying the existing code, although I also think it isn't really adding complexity. It does result in an arguably correct response where before it was clearly incorrect.
Just for the record, the same incorrect behaviour currently exists in Squeak (#atEnd returns the wrong result for a non-regular file).
Please add the gist of David's description of feof() to the code when you fix this. That's extremely important information. TIA
Cheers, Alistair
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@akgrant43 pushed 1 commit.
40ab0f5 21643-FilePlugin-primitiveFileAtEnd
Please add the gist of David's description of feof() to the code when you fix this. That's extremely important information. TIA
Done.
wow, if it works, it looks good!
nicolas-cellier-aka-nice commented on this pull request.
@@ -30,6 +31,7 @@ typedef int mode_t;
typedef struct { int sessionID; /* ikp: must be first */ void *file; + mode_t st_mode; /* from stat() */
By the way, do we still need st_mode and setMode now?
akgrant43 commented on this pull request.
@@ -30,6 +31,7 @@ typedef int mode_t;
typedef struct { int sessionID; /* ikp: must be first */ void *file; + mode_t st_mode; /* from stat() */
Yeah, I just saw that. I don't know how I managed to lose part of the commit. There's also code left behind in sqFileBasicPrims.c.
I'm fixing it now. Sorry about that.
nicolas-cellier-aka-nice commented on this pull request.
@@ -30,6 +31,7 @@ typedef int mode_t;
typedef struct { int sessionID; /* ikp: must be first */ void *file; + mode_t st_mode; /* from stat() */
No problem, there should be a special prime to whoever remove code!
@akgrant43 pushed 1 commit.
bcc644b 21643-FilePlugin-primitiveFileAtEnd
akgrant43 commented on this pull request.
@@ -30,6 +31,7 @@ typedef int mode_t;
typedef struct { int sessionID; /* ikp: must be first */ void *file; + mode_t st_mode; /* from stat() */
Done. Hopefully I've got them all this time.
wow, if it works, it looks good!
I've run the full automated test suite on Ubuntu 64 with Squeak5.1-16549-64bit.image. There were 4 failures and 3 errors, but none looked remotely related.
Pharo 7 I filtered the packages in Test Runner with 'Zinc|File' and there were no failures.
Merged #232.
@akgrant43 Hi,
Since this PR was merged OSProcess is broken if I use it with the latest Pharo 61 and the latest VM.
Here is an example of error:
``` BufferedAsyncFileReadStream(Object)>>primitiveFailed: BufferedAsyncFileReadStream(Object)>>primitiveFailed BufferedAsyncFileReadStream(StandardFileStream)>>primAtEnd: BufferedAsyncFileReadStream(StandardFileStream)>>atEnd [ self readBuffer atEnd and: [ super atEnd ] ] in BufferedAsyncFileReadStream>>atEnd [ caught := true. self wait. blockValue := mutuallyExcludedBlock value ] in Semaphore>>critical: BlockClosure>>ensure: Semaphore>>critical: BufferedAsyncFileReadStream>>atEnd BufferedAsyncFileReadStream(AttachableFileStream)>>upToEnd ExternalPipe>>upToEnd PipeableOSProcess(PipeJunction)>>upToEnd [ super upToEnd ] in PipeableOSProcess>>upToEnd [ caught := true. self wait. blockValue := mutuallyExcludedBlock value ] in Semaphore>>critical: BlockClosure>>ensure: Semaphore>>critical: PipeableOSProcess>>upToEnd PipeableOSProcess(PipeJunction)>>outputOn: PipeableOSProcess(PipeJunction)>>output PipeableOSProcess(PipeJunction)>>outputAndError ```
Hi Cyril,
Can you provide some sample code that triggers the problem? I've never used BufferedAsyncFileReadStream directly, and it's quite a while since I've used OSProcess.
Also:
32 or 64 bit? OS?
Thanks, Alistair
Yes sorry I was a little in a hurry :)
OS:
``` ~> lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial ```
Pharo version: Pharo 61 60540
VM: Latest VM 64bits (we do not test the latest 32bits vm so it's maybe the same)
We use OSProcess this way:
``` compileFile: aFile | thisProcess currentPath result command | thisProcess := OSProcess thisOSProcess. currentPath := thisProcess environment at: #PATH. thisProcess environment at: #PATH put: self binDirectory , ':' , currentPath. command := PipeableOSProcess command: ( '"' , self binPath , '" "' , aFile fullName , '"'). [ result := command outputAndError ] ensure: [ command close ]. thisProcess environment at: #PATH put: currentPath. ^ result first ```
Hi Cyril,
I wasn't able to reproduce it, although I'm obviously missing the definition of "self binPath".
Are you able to adapt the script below to trigger the failure?
``` | thisProcess currentPath result command aFile |
aFile := 'PharoDebug.log' asFileReference. aFile exists ifFalse: [ self error: 'I want a reasonable size file (several kB)' ]. thisProcess := OSProcess thisOSProcess. currentPath := thisProcess environment at: #PATH. command := PipeableOSProcess command: ( 'cat ' , aFile fullName). [ result := command outputAndError ] ensure: [ command close ]. thisProcess environment at: #PATH put: currentPath. result
" #('THERE_BE_DRAGONS_HERE ... " ```
System Report:
``` Image ----- /home/alistair/pharo7/IssueOSProcess6/Pharo.image Pharo6.0 Latest update: #60540 Unnamed
Virtual Machine --------------- /home/alistair/pharo7/IssueOSProcess6/vm/lib/pharo/5.0-201804051426/pharo CoInterpreter VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr 5 2018 StackToRegisterMappingCogit VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr 5 2018 VM: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Thu Apr 5 16:26:17 2018 +0200 $ CommitHash: 3421494 $ Plugins: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ ```
I could not include everything because it's a proprietary code (I already had to remove some part in this example )
It'll be a little hard to get a reproducible case since I only have a windows :( I saw the failure on a Jenkins build. I'll try to find a way to reproduce it.
I added to our Jenkins a command to print the version and I see that we used the same VM.
``` 5.0-201804051426 Thu Apr 5 14:34:45 UTC 2018 gcc 4.8 [Production Spur 64-bit VM] CoInterpreter VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr 5 2018 StackToRegisterMappingCogit VMMaker.oscog-eem.2361 uuid: 7ca2f89a-de70-422f-b92b-54f91ac4e47b Apr 5 2018 VM: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Thu Apr 5 16:26:17 2018 +0200 $ CommitHash: 3421494 $ Plugins: 201804051426 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Linux travis-job-8fe7d8cc-2ee3-4c21-8897-143e18d7a979 4.4.0-101-generic #124~14.04.1-Ubuntu SMP Fri Nov 10 19:05:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux plugin path: /var/lib/jenkins/workspace/Ada-Generator/ARCHITECTURE/64/PHARO/61/VERSION/development/pharo-vm/lib/pharo/5.0-201804051426 [default: /var/lib/jenkins/workspace/Ada-Generator/ARCHITECTURE/64/PHARO/61/VERSION/development/pharo-vm/lib/pharo/5.0-201804051426/] ```
vm-dev@lists.squeakfoundation.org