[Vm-dev] Strange socket behavior
Andreas Raab
andreas.raab at gmx.de
Tue Oct 3 03:01:10 UTC 2006
Hi Guys -
I debugged a *really* interesting problem today. For some reason, our
Croquet sessions failed seemingly random with socket timeouts in strange
places. The main clue we had was that it was somehow related to a rather
large space being replicated over a rather slow line (a DSL uplink as
the source for replication).
Tracking this down into its gory details I ended up with a test case
like here:
data := ByteArray new: 10000000.
socket := Socket newTCP.
socket connectTo: 'myHost' port: myPort.
socket sendData: data count: count.
socket sendData: 'Hello' count: 5.
When I did this over a slow uplink this would *reliably* time out on the
second sendData:count: call. But why? Simply put, because the windows
sockets interface doesn't quite function like I *thought* it would. I
had expected the Windows send() call to accept only a "TCP packet size"
full of data but it turns out it takes *everything* right down to the
last byte in the first call. Meaning that the first sendData: call
returns immediately but after that call it's chugging along trying to
get the data out to the interface and the next sendData: call really
wants a response with the default ConnectionTimeOut (which is less than
the time it needs to complete the previous send).
Why is this relevant? I believe pretty much all code we currently have
is written under the assumption that the primitive will only accept
"reasonable" amounts of data. Any code that pushes large amounts of data
and expects the socket interface to handle it will be affected by this
problem. I also suspect that other platforms may show similar behavior
so some testing is in order. If you had random unexplained timeouts when
sending large data buffers over slow lines, splitting them up into
smaller ones as a workaround may just be your ticket until I fixed this
problem in the VM, e.g., make the VM only take "reasonable" amounts of
data in each call such that the caller can rest assured that the time
out values are meaningful.
I would also be interested in what other platforms do. Basically, the
question is whether the primitive returns immediately in a single call,
consuming all the data, or whether it will loop in
Socket>>sendData:count:. If you have evidence towards either end please
post your results to VM-dev (incl. the precise version of your OS).
Cheers,
- Andreas
More information about the Vm-dev
mailing list