(Resending with proper subject…)
On 08 Jan 2015, at 20:48, squeak-dev-request@lists.squeakfoundation.org mailto:squeak-dev-request@lists.squeakfoundation.org wrote:
Date: Thu, 8 Jan 2015 16:56:30 +0100 From: Henrik Johansen <henrik.s.johansen@veloxit.no mailto:henrik.s.johansen@veloxit.no> Subject: [squeak-dev] Re: [Vm-dev] [OSProcess] forking and file descriptors
On 08 Jan 2015, at 11:37 , Max Leske <maxleske@gmail.com mailto:maxleske@gmail.com> wrote:
Hi
We currently use ImageSegment to create snapshots of our object graphs. To ensure consistency (and for performance reasons) we create a fork of the image and then run the segment creation in the fork. We’ve always had minor issues with TCP sockets but they are pretty rare and have never corrupted any data (we close the TCP connections in the child).
Recently however, we created a new application which also makes heavy use of a database and now it seems that forking creates a real problem. In anticipation of possible problems I opted to destroy all sockets (with Socket>>destroy) in the fork, thinking that, since all file descriptors are copies of the ones in the parent process, the sockets in the parent process should be unaffected [1], [2]. With that mechanism in place however, we are seeing very weird things, such as multiples sockets in the parent (!) having the same file handle (which leads to the wrong data being read from the database and, in turn, corrupt objects).
AFAICT, the OSProcess plugin doesn’t offer any way of dealing with such problems so I was wondering if anybody has had any experience with these kinds of issues and whether there is some kind of best practice.
I am aware that the most simple option is to close the sockets in the parent before forking, but that will mean that we would have to wait for all database connections to finish executing, then blocking them to prevent new connections to the database. Depending on the time a query takes (which may well be a couple of seconds in our case) clients would need to wait for quite a long time before their request can be answered (and this scenario of course assumes that we only close the database sockets and leave the TCP sockets open…).
So under the condition that I need to fork that image, what is the best way to deal with open file descriptors?
Thanks for your time. Max
[1] http://man7.org/linux/man-pages/man2/fork.2.html http://man7.org/linux/man-pages/man2/fork.2.html [2] http://man7.org/linux/man-pages/man2/clone.2.html http://man7.org/linux/man-pages/man2/clone.2.html
Well... If I understand the source correctly (at least on Unix, https://github.com/pharo-project/pharo-vm/blob/master/platforms/unix/plugins... https://github.com/pharo-project/pharo-vm/blob/master/platforms/unix/plugins/SocketPlugin/sqUnixSocket.c <https://github.com/pharo-project/pharo-vm/blob/master/platforms/unix/plugins... https://github.com/pharo-project/pharo-vm/blob/master/platforms/unix/plugins/SocketPlugin/sqUnixSocket.c>) The socketHandle in a Socket instance is a pointer to a private (platform-specific) struct. That struct again has a handle to the native socket, which I assume is what gets copied when you fork a process?
Socket >> primDestroySocket frees the memory pointed to by socketHandle.
Hm… that gives me an idea: assume that everything works as advertised and that the child process gets copies of the socket descriptors (which can be closed safely without intefering with the parent). If I’m right, the Socket instances in the image hold on to the address of the *parent* handle (in an inst var). So now, when I close a socket with #primSocketDestroy:, the handle passed to the plugin will be the handle of the parent socket (although it sounds strange that the child should be able to close a file descriptor of its parent…).
That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.
Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).
So, are you using clone or fork to create a fork of the image?
OSProcess uses plain fork() (in forkSqueak()). That’s what I use from the image.
If their memory is shared (clone) instead of copied (fork), you might be kicking the feet out from under the parent image as well, so to speak…
From my understanding the file descriptors should be copied (fork). So that shouldn’t happen (but see above…).
Thanks Henry.
Cheers, Henry
On 09 Jan 2015, at 9:32 , Max Leske maxleske@gmail.com wrote:
That would mean that I must not close any sockets in the child. One option, it seems to me, is to suspend all processes that use sockets. Terminating them might pose another problem, if socket destruction is part of an unwind block in one of the processes (e.g. TCP connections in Seaside) then sockets will be destroyed during termination.
Another option: set all the socket handles to nil, then terminate the processes (yes ugly, but it might just work…).
Just beware you might run into the issue that resuming a processes waiting for a semaphore will proceed as if the semaphore were signalled, Can't tell offhand if that would actually be a problem in this case, or if the affected processes would promptly resume waiting after socket read/writes may initiate with no data.
Another (probably non-portable, which would be painful if not consistent across platforms) option: Forget about image-side handling, and alter the SocketPlugin to set FD_CLOEXEC if available when opening sockets. (It's in http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/fcntl.h.html , but that's rather new) On newer Linuxen, you also have SOCK_CLOEXEC for socket(), which opens/sets in an atomic operation, but the race condition avoided by that is hardly relevant in our case.
Cheers, Henry
On Fri, Jan 09, 2015 at 09:32:53AM +0100, Max Leske wrote:
OSProcess uses plain fork() (in forkSqueak()). That???s what I use from the image.
If their memory is shared (clone) instead of copied (fork), you might be kicking the feet out from under the parent image as well, so to speak???
From my understanding the file descriptors should be copied (fork). So that shouldn???t happen (but see above???).
The method comment in UnixOSProcessPlugin>>forkSqueak may be helpful, so I will copy it here:
forkSqueak "Fork a child process, and continue running squeak in the child process. Answer the result of the fork() call, either the child pid or zero.
After calling fork(), two OS processes exist, one of which is the child of the other. On systems which implement copy-on-write memory management, and which support the fork() system call, both processes will be running Smalltalk images, and will be sharing the same memory space. In the original OS process, the resulting value of pid is the process id of the child process (a non-zero integer). In the child process, the value of pid is zero.
The child recreates sufficient external resources to continue running. This is done by attaching to a new X session. The child is otherwise a copy of the parent process, and will continue executing the Smalltalk image at the same point as its parent. The return value of this primitive may be used by the two running Smalltalk images to determine which is the parent and which is the child.
The child should not depend on using existing connections to external resources. For example, the child may lose its connections to stdin, stdout, and stderr after its parent exits.
The new child image does not start itself from the image in the file system; rather it is a clone of the parent image as it existed at the time of primitiveForkSqueak. For this reason, the parent and child should agree in advance as to whom is allowed to save the image to the file system, otherwise one Smalltalk may overwrite the image of the other.
This is a simple call to fork(), rather than the more common idiom of vfork() followed by exec(). The vfork() call cannot be used here because it is designed to be followed by an exec(), and its semantics require the parent process to wait for the child to exit. See the BSD programmers documentation for details."
| pid intervalTimer saveIntervalTimer | <export: true> <returnTypeC: 'pid_t'> <var: 'pid' type: 'pid_t'> <var: 'intervalTimer' type: 'struct itimerval'> <var: 'saveIntervalTimer' type: 'struct itimerval'>
"Turn off the interval timer. If this is not done, then the program which we exec in the child process will receive a timer interrupt, and will not know how to handle it." self cCode: 'intervalTimer.it_interval.tv_sec = 0'. self cCode: 'intervalTimer.it_interval.tv_usec = 0'. self cCode: 'intervalTimer.it_value.tv_sec = 0'. self cCode: 'intervalTimer.it_value.tv_usec = 0'. self cCode: 'setitimer (ITIMER_REAL, &intervalTimer, &saveIntervalTimer)'. pid := self fork.
"Enable the timer again before resuming Smalltalk." self cCode: 'setitimer (ITIMER_REAL, &saveIntervalTimer, 0L)'. ^ pid
vm-dev@lists.squeakfoundation.org