[Vm-dev] Strange socket behavior

Andreas Raab andreas.raab at gmx.de
Tue Oct 3 03:01:10 UTC 2006


Hi Guys -

I debugged a *really* interesting problem today. For some reason, our 
Croquet sessions failed seemingly random with socket timeouts in strange 
places. The main clue we had was that it was somehow related to a rather 
large space being replicated over a rather slow line (a DSL uplink as 
the source for replication).

Tracking this down into its gory details I ended up with a test case 
like here:

   data := ByteArray new: 10000000.
   socket := Socket newTCP.
   socket connectTo: 'myHost' port: myPort.
   socket sendData: data count: count.
   socket sendData: 'Hello' count: 5.

When I did this over a slow uplink this would *reliably* time out on the 
second sendData:count: call. But why? Simply put, because the windows 
sockets interface doesn't quite function like I *thought* it would. I 
had expected the Windows send() call to accept only a "TCP packet size" 
full of data but it turns out it takes *everything* right down to the 
last byte in the first call. Meaning that the first sendData: call 
returns immediately but after that call it's chugging along trying to 
get the data out to the interface and the next sendData: call really 
wants a response with the default ConnectionTimeOut (which is less than 
the time it needs to complete the previous send).

Why is this relevant? I believe pretty much all code we currently have 
is written under the assumption that the primitive will only accept 
"reasonable" amounts of data. Any code that pushes large amounts of data 
and expects the socket interface to handle it will be affected by this 
problem. I also suspect that other platforms may show similar behavior 
so some testing is in order. If you had random unexplained timeouts when 
sending large data buffers over slow lines, splitting them up into 
smaller ones as a workaround may just be your ticket until I fixed this 
problem in the VM, e.g., make the VM only take "reasonable" amounts of 
data in each call such that the caller can rest assured that the time 
out values are meaningful.

I would also be interested in what other platforms do. Basically, the 
question is whether the primitive returns immediately in a single call, 
consuming all the data, or whether it will loop in 
Socket>>sendData:count:. If you have evidence towards either end please 
post your results to VM-dev (incl. the precise version of your OS).

Cheers,
   - Andreas


More information about the Vm-dev mailing list