3 new mac VMs + pending Mac VM...

Tue Oct 10 17:38:04 UTC 2000

John M McIntosh <johnmci at smalltalkconsulting.com> wrote:
> These problems are all triggered if your code fails to close and/or 
> destroy the socket after use, if you leave the socket to be finalized 
> then issues arise (they shouldn't, perhaps < 1% of the time we run 
> into a unique situation which triggers the two problems). The most 
> common way of doing this is doing a listen/accept and getting a 
> socket which is closed or is in some sort of error state, then you 
> just let the socket get GCed, versus doing an explicit destroy.
> 

FWIW, it's not all cases of abandoned sockets that cause trouble.  I
just tried a quick test on Ian's most recent Unix VM and with Squeak
2.9ap2447, and finalization cleaned up the loose sockets just fine.

The test is described below, but let me get to the beef.  When you
consider that #finalize for sockets just calls an equivalent to #destoy,
the implications are:

	1. In the common cases, GC-based finalization works fine for sockets.

	2. If there is a bug in the GC+finalization mechanism in general, it's
not an obvious one.  (As an aside, I've tested this part for files on a
Mac, and it worked there as well).

	3. There is probably no difference between closing a socket via
finalization and closing it via #destroy.  

What I start to wonder is, could there be a problem in *regular* closing
of sockets in error states?  Is there reason to believe that #destroy
isn't being called at all, as opposed to #destroy having a problem?  An
easy way to answer this would be to generate a log file from the VM
listing all calls to socket creation and socket destruction.  Ideally,
the create function should log whether it succeeded, and the destroy
function should log what state the socket was in.

Anyway, that's the beef.  The rest of this message just describes the
tests I tried.  I used Ian's most recent Unix VM and a Squeak 2.9ap2447
image.

First, inspect the following:

	| b |
	b _ Bag new.
	1300 timesRepeat: [ b add: Socket new ].
	b

(Actually, I interrupted the process before it allocated all 1300,
because it was doing GC's like mad and thus was getting really slow). 
After running this a while, Socket new would return a Socket in state
destroyed, suggesting that so many sockets were open that a new one
couldn't be opened.  As an additional test, I checked in /proc/###/fd
and see that >1000 file descriptors were open.

I closed the debugger and tried Socket new again.  It gave a socket in
state unconnected, which means that it successfully opened a socket and
thus at least some of the other sockets must have been closed.  Also,
/proc/###/fd only had about 7-10 files in it now.  So, GC+finalization
works on Unix in this case.

But maybe that's because the sockets were in new space.  What if they
have been tenured into old space?  To get at this, I repeated the above
but forced a tenure operation after creating the 1000 sockets.  I forced
a tenure by doing 

	(Array new: 1000000) collect: [ :x | Object new ]

and then interrupting it after a while.  :)  The "vm statistics" showed
that the number of tenures had increased, which I believe would mean
that the 1000 sockets had been tenured as well.

Indeed, "Socket basicNew initialize: 0" -- the version of socket
creation that will not run a GC if it fails -- failed.  Even after
clicking on a lot of windows and generating lots of incremental GC's,
this version failed.  So, okay, the sockets got tenured.

Yet, when I closed the inspector holding the 1000 sockets, the next
"Socket new" operation successfully opened a socket (although, it took a
while to do it).

These tests show that if there are errors in the general GC+finalization
mechanism, they are subtle ones.

	-Lex