Lots of concurrent IO.

Mon Aug 13 21:24:02 UTC 2001

> One of the requirements of it is to handle large numbers of concurrent
> socket connections and also handle incoming data securily to be immune
> from DoS. By large number, I mean 50-500. Thus, busywaiting will be
> deadly.
> 

This isn't as much of a problem as you might expect, so give it a try
before you do something complicated!  Many people disagree with me, so
let me spell it out this argument once more, with numbers this time. 
Incidentally, it applies to user interfaces just as much as to network
code.

Before getting to the numbers, consider these two ground arguments.

First, it is significant that a user is inside the critical loop for
most network systems and for all user interfaces.  When a user is in the
loop, anything more than 100 updates per second is wasted, and thus
you have 10 milliseconds available for each update.  10 ms is a very
long time for most interactive systems!

Second, if every socket is busy, then polling has no inefficiency. 
Every poll will be a hit, and you can start processing the socket
immediately.  In fact, polling can even be a little *more* efficient in
this extreme case, because you don't have to have any events
infrastructure.  Thus, the wasted time due to polling is proportional to
the number of *idle* sockets you have; the percentage of wasted time
in fact goes down as the server gets more heavily loaded.

Those arguments aside, here are the numbers.  I just checked, and my
modest 366 MHz computer can do 95,000 #dataAvailable's per second on a
socket that is open but idle.  That's 95 checks per millisecond, and 950
times as fast as the 10 millisecond threshold mentioned above.  Thus, on
a completely idle system, you check 950 sockets 100 times per second.

But what about a busy system?  Here are some possible max loads,
using these numbers:

        0 busy sockets and 950 idle sockets  (totally idle)
        10 busy sockets and 0 idle sockets   (totally slammed)
        9 busy sockets and 95 idle sockets   (almost totally slammed)
        8 busy sockets and 190 idle sockets  (still pretty heavy...)
        ... etc ...

As you can see, with my numbers idle sockets are really cheap -- just
shy of 100 times cheaper than a busy socket.  For each 1 busy socket
less than the maximum possible load, you can handle 95 more idle
sockets.

Now, some might argue with my 1 millisecond guestimate for servicing an
active socket.  Fine--that timing will indeed depend on your specific
application.  However, whatever number is accurate for your program,
my numbers show that either idle sockets will be relatively cheap, or
performance will already be fine.  If you need more than 1 millesecend,
than idle sockets cost even less than 1/100th of a busy socket.  If you
need less than 1 ms, then your system is quite fast already -- stop
fiddling with it.

> 
> My problem:
>    I can't get the reader process to block. If I use
>   Socket>>waitForDataUntil: (Socket deadlineSecs: 60)

This will wait for 60 seconds, but it really should return early if data
arrives.  Can you make a small example that causes this problem?

Anyway, I don't think you'll need code like this in the single-threaded,
polling-based architecture.  Instead, you can have a sort of Client
object ("client" being from the server's perspective), and then give a
#processIO method to each Client when you are ready to poll.  #processIO
will check the client's socket with #dataAvailable, and handle any new
data directly.

You don't even need the message queue, really -- just go ahead and
handle the message!

For a thorough example, check out IRCConnection's #processIO method.

The one tricky bit of this whole approach is that your #processIO method
must deal with partially-arrived messages.  This can indeed be handled
elegantly if you switch to a thread-per-socket design.  However, this
problem isn't *that* hard, and it's the only one you will have--it's a
relatively small portion of the overall code, and the rest of the code
will run with single-threaded simplicity.  For examples of this code,
see IRCConnection, StringSocket, and ArbitraryObjectSocket.  

In fact, after reading all of this, maybe just using StringSocket is
sounding like a reasonable approach.

Overall, putting a user in the loop changes the performance game
significantly, and so a user's presence should not be ignored.  More
generally, it's good to *try* a simple solution before assuming that it
will obviously be too slow....  People -- myself included -- are really
bad at predicting performance characteristics of a program that they
haven't already written and experimented with.

Regards,

Lex Spoon