[Vm-dev] [OpenSmalltalk/opensmalltalk-vm] FileStreams are limited and too slow (Issue #613)

unique75m notifications at github.com
Thu Feb 17 18:06:13 UTC 2022


I copy the email conversation. The short form is, i am running Squeak 5.3 on my macOSX 12.0.1 (2017 macBook Pro) and there are 2 problems. The file stream primitives (open/(close) are too slow and because they are slow i need to hold 4000 file streams open, but then i get a problem with the external semaphore limit and Squeak is freezing and consumes 100% cpu time.

##############################################

Is there a limit with file streams? I cannot open more than 238 on my macOSX. The 239th fails and Squeak runs with 100% cpu power. I changed then the method StandardFileStream>>open:forWrite: and removed the #retryWithGC:until:forFileNamed:, because it looked to me, that it does not much things. Now I get a <nil> back from the primitive.

| directory streams |
directory := FileDirectory default directoryNamed: 'Test'.
directory assureExistence.
streams := (1 to: 239)
	collect: [:index | directory fileNamed: 'test', index printString].
streams do: #close

Jörg

#############################################

On 2/17/22 08:20, Jörg Belger wrote:
Is there a limit with file streams? I cannot open more than 238 on my macOSX. The 239th fails and Squeak runs with 100% cpu power. I changed then the method StandardFileStream>>open:forWrite: and removed the #retryWithGC:until:forFileNamed:, because it looked to me, that it does not much things. Now I get a <nil> back from the primitive.

There may be a per-process limit on the number of file descriptors. When Squeak runs out, it is (I suspect) assuming that some unused file handles exist on the heap, so it tries to GC, hoping that a collection will close some unwanted file descriptors. Unfortunately if it really is running up against the limit, this strategy won't help! (We should do something better... but what?)

You might be able to see what the current limit is with `ulimit -a`.

If that's the cause of the problem, you could try to raise the limit at the OS level - I'm afraid I don't have a mac to hand to try it myself though.

Regards,
 Tony

##############################################

Hi Jörg,

On Thu, 17 Feb 2022, Jörg Belger wrote:

Is there a limit with file streams? I cannot open more than 238 on my macOSX. The 239th fails and Squeak runs with 100% cpu power. I changed then the method StandardFileStream>>open:forWrite: and removed the #retryWithGC:until:forFileNamed:, because it looked to me, that it does not much things. Now I get a <nil> back from the primitive.

| directory streams |
directory := FileDirectory default directoryNamed: 'Test'.
directory assureExistence.
streams := (1 to: 239)
	collect: [:index | directory fileNamed: 'test', index printString].
streams do: #close

There is a limit on the maximum number of external semaphores provided by the VM. Each socket uses 3 external semaphores and each file uses one.
The default limit is 256 which is very conservative and I hope the next release will ship with at least 4096 as default.

Anyway, you can check the current value with

	Smalltalk maxExternalSemaphores

and set it with

	Smalltalk maxExternalSemaphores: 4096


Some care must be taken not to change the limit while signals are expected to be processed (e.g. when you have activity on those sockets/files), because there's a short period during the resize while the VM may fail to signal the semaphores.


Levente

###############################################

My problem is simply that I need to leave the streams open coz reopening for every write is too slow. I have realtime data coming through a socket in nanosecond precision and the file handling must be very fast. Currently I have 120 nanosecond realtime streams and 2645 minute-based streams. 

As I use now binary format instead of the previous CSV format, I cannot read the plain data files anyway, so maybe I will give Magma a try. It does not matter if I can’t read binary files or can’t read Magma files with a text editor :-)

###############################################

Seems not to work. I set the max semaphores to 8192 and opened only 4000 files. But the Squeak VM only created 242 files on hard disk and freezes with 100% cpu consumption

###############################################

Hi Jörg,

On Thu, 17 Feb 2022, Jörg Belger wrote:

Seems not to work. I set the max semaphores to 8192 and opened only 4000 files. But the Squeak VM only created 242 files on hard disk and freezes with 100% cpu consumption

As others wrote, there are also limits on the maximum number of open file descriptors.
One is set by the OS per process. Typically 1024 on unix machines. E.g. on linux you can see the limit with the ulimit -n command. That can typically be changed with superuser privileges.
And there's another limit if the file descriptor sets are processed with the select() function, which currently happens on Windows and Mac.
Linux (and IIRC OpenIndiana too) uses epoll(), so there is no upper limit there.
If select() is used, the value of FD_SETSIZE decides the maximum number of file descriptors that can be opened. Its value is typically 1024, and you'd need to compile your own VM to be able to change that if the OS permits it (Windows and Mac do).


Levente

################################################

The least I expect is that I get a debugger in Smalltalk when the limit is reached and not a Squeak that is not reactive and consumes 100% cpu time. But anyway I get the hanging Squeak already with 242 file streams, independent if I set the #maxExternalSemaphores to 8192.

If I do the same in VisualWorks and open 4000 write streams I have no problems with limits, so I guess it is not an operating system limit, it is a Squeak VM limit. There is no reason why everything needs to be implemented in C code, the goal should be to implement as much as we can in Smalltalk.

The reason why I opened so much files is that the Squeak file API is very slow for my realtime streaming, so I simply leave the files open.
Here is an example:

VisualWorks needs 700ms
Time millisecondsToRun:
	[| filename |
	filename := 'test.txt' asFilename.
	10000 timesRepeat:
		[filename readWriteStream close]]

Squeak needs 4000ms
Time millisecondsToRun:
	[10000 timesRepeat: [(FileStream fileNamed: 'test,txt‘) close]]

If you can tell me a faster way to open/write/close a file it would be nice.

Jörg

###############################################

On Thu, Feb 17, 2022 at 05:05:43PM +0100, J??rg Belger wrote:
The least I expect is that I get a debugger in Smalltalk when the limit is reached and not a Squeak that is not reactive and consumes 100% cpu time. But anyway I get the hanging Squeak already with 242 file streams, independent if I set the #maxExternalSemaphores to 8192.

If I do the same in VisualWorks and open 4000 write streams I have no problems with limits, so I guess it is not an operating system limit, it is a Squeak VM limit. There is no reason why everything needs to be implemented in C code, the goal should be to implement as much as we can in Smalltalk.

The reason why I opened so much files is that the Squeak file API is very slow for my realtime streaming, so I simply leave the files open.
Here is an example:

VisualWorks needs 700ms
Time millisecondsToRun:
	[| filename |
	filename := 'test.txt' asFilename.
	10000 timesRepeat:
		[filename readWriteStream close]]

Squeak needs 4000ms
Time millisecondsToRun:
	[10000 timesRepeat: [(FileStream fileNamed: 'test,txt???) close]]

On my Linux box it only takes about 100ms.



If you can tell me a faster way to open/write/close a file it would be nice.

J??rg


You may want to try profiling it to see where the time is going:

 TimeProfileBrowser onBlock: [10000 timesRepeat: [(FileStream fileNamed: 'test,txt') close]].

Dave

################################################

Hi Jörg,

On Thu, 17 Feb 2022, Jörg Belger wrote:

The least I expect is that I get a debugger in Smalltalk when the limit is reached and not a Squeak that is not reactive and consumes 100% cpu time. But anyway I get the hanging Squeak already with 242 file streams, independent if I set the #maxExternalSemaphores to 8192.

Squeak is trying to be optimistic and assumes that files can be opened but the descriptors are consumed by abandoned but unclosed files. Doing a GC will trigger the automatic closing of such files freeing up the file descriptors.


If I do the same in VisualWorks and open 4000 write streams I have no problems with limits, so I guess it is not an operating system limit, it is a Squeak VM limit. There is no reason why everything needs to be implemented in C code, the goal should be to implement as much as we can in Smalltalk.

The reason why I opened so much files is that the Squeak file API is very slow for my realtime streaming, so I simply leave the files open.
Here is an example:

VisualWorks needs 700ms
Time millisecondsToRun:
	[| filename |
	filename := 'test.txt' asFilename.
	10000 timesRepeat:
		[filename readWriteStream close]]

Squeak needs 4000ms
Time millisecondsToRun:
	[10000 timesRepeat: [(FileStream fileNamed: 'test,txt‘) close]]

If you can tell me a faster way to open/write/close a file it would be nice.

On my machine (linux) I get 145ms. Are you on Windows?


Levente

####################################################

MultiByteFileStream >> open:forWrite: needs  96.8%

Can I increase the tallying depth? Because it stops at this method and I do not see the calls inside this method.

####################################################

Funny, I commented out some things (red) in #open:forWrite:. Now the profiler is telling me that „String new: 1“ tools 97.4%… Really? Or is this a profiler bug?

open: fileName forWrite: writeMode 
	"Open the file with the given name. If writeMode is true, allow writing, otherwise open the file in read-only mode."
	"Changed to do a GC and retry before failing ar 3/21/98 17:25"
	| f |
	f := fileName asVmPathName.

	fileID := "StandardFileStream retryWithGC:[" self primOpen: f writable: writeMode "] 
					until:[:id| id notNil] 
					forFileNamed: fileName".
	fileID ifNil: [^ nil].  "allows sender to detect failure"
	name := fileName.
	"self register."
	rwmode := writeMode.
	buffer1 := String new: 1.
	self enableReadBuffering

######################################################

On Thu, 17 Feb 2022, Jörg Belger wrote:

Funny, I commented out some things (red) in #open:forWrite:. Now the profiler is telling me that „String new: 1“ tools 97.4%… Really? Or is this a profiler bug?

It's a limitation of the profiler. That profiler uses sampling and is implemented entirely in Smalltalk, so it can only measure message sends.

What happens with your code is that after the #asVmPathName send a primitive is invoked (#primOpen:writable:), and that is not a real message send.
Then inlined code comes (#ifNil:), which is also not a send.
Then assignments come which are also not sends.
The first actual send is String >> #new:.
So, the profiler will count everything that happend before that send and after #asVmPathName towards String >> #new:, which is why you see such high number there.

Since you know that String >> #new: with argument 1 is not expected to take too long, and the same applies to #ifNil: and the assignments, you can be sure that the time is spent in #primOpen:writable:, which is primitiveFileOpen in FilePlugin. And that is known to be quite slow on Windows presumably due to anti-virus measures.


Levente

#######################################################

I am on a Mac :-)
And the 4 times faster VisualWorks code is running on same Mac :-)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/613
You are receiving this because you are subscribed to this thread.

Message ID: <OpenSmalltalk/opensmalltalk-vm/issues/613 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220217/85fb3317/attachment-0001.html>


More information about the Vm-dev mailing list