[Seaside] Squeak async i/o performrance,
was Re: HTTP Performance
Stephen Pair
stephen at pairhome.net
Thu Nov 20 16:01:37 UTC 2003
Andreas Raab wrote:
>>This test really only measures socket connection performance.
>>
>>
>
>Which is 99% bounded by the interface between VM and image. Precisely the
>thing that I wanted to measure. So if there's latency between VM and image
>then this latency should become visible here (note that both connect as well
>as accept require state changes on the socket).
>
>
>
>>My guess is that something is causing an excessive latency between
>>the time a socket semaphore is marked for signaling in the VM and
>>when the waiting squeak process actually wakes up.
>>
>>
>
>That's possible but not likely according to the results I got. First of all,
>the results show that no matter how hard you try you can't get above 1k
>connections per second (bounded by the time it takes to create and destroy a
>socket). Secondly, if there were latency involved, then it should be visible
>by reduced CPU usage - however, the CPU maxes out during the test which is a
>clear indication that Squeak is doing "something".
>
1000 connections per seconds may indeed be an upper bound...however, it
may not be an upper bound for http requests per second (remember 1 http
request ~= 1 socket connection). And, in may tests, I do show
significant idle time.
I bumped the comanche server priority to 80 and observed one 'ab' test
get 1477 rps (using: ab -n 1024 -c 16 http://10.0.0.181:8080/)...this
test should only open 16 socket connections (my comanche server is set
to allow up to 100 http requests on a single socket). Also, this test
is the "hello world" test that only sends back the in memory "hello
world" string (so no file I/O involved). However, this result is not
consist, running it over and over, I sometime get as few as 100 rps and
usually get in the neighborhood of 200-300 rps. This is running ab from
a 1ghz pIII running Linux to a 1.8ghz Windows machine.
But, the cool thing is that about 1 in every 10 or 11 attempts I get in
the range of 1000 to 1500 http RPS! So, the question is, how come I
can't get that on every attempt. Why the erratic numbers? Another
interesting data point is that when I get a high RPS, I'm also getting a
high number of failed requests. One case where I achieved 1154 rps, I
had 1024 completed requests (of course) and 1943 failed requests. Also,
adjusting the socket buffers sizes and backlog size seemed to have
little (if any) impact).
>Considering that during
>this test, the script is run in the main process there is no UI
>interference. Given that there were no other "interesting" processes active
>it is reasonable to assume that all of the (maxed out CPU) time is spent
>within Squeak. Which means within the primitives observed by the tally which
>means that latency is probably not the limiting factor. Q.E.D. ;-)
>
>
Which I'm not doubting (regarding the connection tests)...I think the
problem is more related to a latency associated with the read/write
semaphores...and perhaps this is a latency in the socket interface code
to mark these semaphores for signaling. Who knows. I'm baffled.
- Stephen
>And of course, all of this is on Windows. YMMV.
>
>Cheers,
> - Andreas
>
>
>
>>-----Original Message-----
>>From: seaside-bounces at lists.squeakfoundation.org
>>[mailto:seaside-bounces at lists.squeakfoundation.org] On Behalf
>>Of Stephen Pair
>>Sent: Thursday, November 20, 2003 3:53 PM
>>To: The general-purpose Squeak developers list
>>Cc: 'The Squeak Enterprise Aubergines Server - general discussion.'
>>Subject: Re: [Seaside] Squeak async i/o performrance, was Re:
>>HTTP Performance
>>
>>
>>This test really only measures socket connection performance. RPS as
>>measured by ab (apache benchmark tool) is a full HttpRequest and
>>HttpResponse. So they are quite different things than what this test
>>measures. The ab tests we were running only open a relatively few
>>socket connections (I was using 4, 8, 16, 32, etc). The number of
>>simultaneous socket connections seems to have little effect
>>on the HTTP
>>RPS throughput. Measuring HTTP rps on both inbound connections (to
>>comanche) and outbound (to an apache server) yield similar results in
>>terms of RPS...even though ab can hit apache with over 1000
>>RPS, squeak
>>can only hit apache with ~50 RPS...and, we seem to spend a
>>lot of time
>>(~70%) during these benchmarks waiting on socket semaphores.
>>My guess
>>is that something is causing an excessive latency between the time a
>>socket semaphore is marked for signaling in the VM and when
>>the waiting
>>squeak process actually wakes up.
>>
>>- Stephen
>>
>>Andreas Raab wrote:
>>
>>
>>
>>>Hi Bruce,
>>>
>>>
>>>
>>>
>>>
>>>>I poked at this today and I don't quite get the math to work.
>>>>
>>>>
>>>>
>>>>
>>>Interesting you should say this. My impression after running
>>>
>>>
>>a bunch of
>>
>>
>>>tests with Avi and Stephen yesterday was very similar. I can
>>>
>>>
>>say for sure
>>
>>
>>>that there's something wrong here but it's not quite clear
>>>
>>>
>>what. I've been
>>
>>
>>>running an interesting little benchmark which might be good
>>>
>>>
>>to run on other
>>
>>
>>>machines as well. Here it is:
>>>
>>>"--- the server (run in one image) -----"
>>>server := Socket newTCP.
>>>server listenOn: 12345 backlogSize: 4.
>>>
>>>"wait for the first connection (this allows us to profile both ends)"
>>>[client := server waitForAcceptFor: 10 ifTimedOut:[nil].
>>>client == nil] whileTrue.
>>>client destroy.
>>>
>>>"Now do 1k connections"
>>>MessageTally spyOn:[
>>> 1 to: 1000 do:[:i|
>>> client := server waitForAcceptFor: 10 ifTimedOut:[nil].
>>> client destroy.
>>> ].
>>>].
>>>"clean up"
>>>server destroy.
>>>
>>>
>>>"--- the client (run in another one) ---"
>>>addr := NetNameResolver addressForName: '127.0.0.1'.
>>>
>>>"wait for the first connection (to sync with server)"
>>>[client := Socket newTCP.
>>>[client connectTo: addr port: 12345] on: ConnectionTimedOut
>>>
>>>
>>do:[:ex| ex
>>
>>
>>>return].
>>>client isConnected] whileFalse:[client destroy].
>>>
>>>"Now do 1k connections"
>>>MessageTally spyOn:[
>>> client := Socket newTCP.
>>> 1 to: 1000 do:[:i|
>>> client := Socket newTCP.
>>> client connectTo: addr port: 12345.
>>> client destroy.
>>> ].
>>>].
>>>
>>>Running this shows profiles on both server and client side as well as
>>>(indirectly) the number of connections established. It turns
>>>
>>>
>>out that this
>>
>>
>>>runs in ~4secs on my machine which makes for about 250
>>>
>>>
>>connections per
>>
>>
>>>second. Looking closer, it looked as if much of the time is
>>>
>>>
>>spent in socket
>>
>>
>>>creation and destroy (which admittedly is a heavy-weight operation on
>>>Windows) and measuring it shows this clearly:
>>>
>>>[1 to: 1000 do:[:i| Socket newTCP destroy]] timeToRun
>>>
>>>This takes about a second (after some warmup time ;) which
>>>
>>>
>>means that we
>>
>>
>>>need about 1ms for creating and destroying a single socket.
>>>
>>>
>>Since the above
>>
>>
>>>was run on a single machine it means that 50% of the time in
>>>
>>>
>>the benchmark
>>
>>
>>>is spent in socket creation/destroy (2k sockets used which
>>>
>>>
>>means 2secs
>>
>>
>>>overhead out of the 4secs the benchmark took).
>>>
>>>Definitely something to improve but it still doesn't say why Comanche
>>>doesn't get more than 50rps.
>>>
>>>Cheers,
>>> - Andreas
>>>
>>>
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: seaside-bounces at lists.squeakfoundation.org
>>>>[mailto:seaside-bounces at lists.squeakfoundation.org] On Behalf
>>>>Of Bruce ONeel
>>>>Sent: Thursday, November 20, 2003 1:58 PM
>>>>To: The Squeak Enterprise Aubergines Server - general discussion.
>>>>Cc: squeak-dev at lists.squeakfoundation.org
>>>>Subject: [Seaside] Squeak async i/o performrance, was Re:
>>>>HTTP Performance
>>>>
>>>>
>>>>Avi Bryant <avi at beta4.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Like Stephen, I'm intrigued by your 50rps figure for
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Comanche - that's
>>>>
>>>>
>>>>
>>>>
>>>>>the number that I always seem to come up with as well.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>There *must* be
>>>>
>>>>
>>>>
>>>>
>>>>>something throttling that, and we need to get to the bottom of it.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Hi,
>>>>
>>>>I poked at this today and I don't quite get the math to work.
>>>>I looked at unix Squeak 3.4-1 so that might distort things.
>>>>
>>>>The basis of the clock in interp.c seems to be in milisecs. I'm
>>>>getting idea from comments and the definition of ioiMsecs in
>>>>sqXWindow.c. So, given that:
>>>>
>>>>checkForInterrupts (interp.c) is called approx every 1000 byte codes
>>>>executed. If it hasn't been 3 milisecs since the last call this
>>>>1000 is scalled up. If it's been more it is scalled down. We seem
>>>>to be trying to check for interrupts every 3 milisecs. This is
>>>>controlled by interruptChecksEveryNms.
>>>>
>>>>Once this happens if it has been 500 milisecs (this seems high)
>>>>after the last time we called checkForInterrupts we call
>>>>ioProcessEvents().
>>>>This comes from the bits in checkForInterrupts()
>>>>if (now >= nextPollTick) {
>>>> ioProcessEvents();
>>>> nextPollTick = now + 500;
>>>>}
>>>>
>>>>ioProcessEvents (sqXWindow.c) calls aioPoll.
>>>>
>>>>aioPoll (aio.c) calls select and then calls the aio handler for
>>>>each descriptor which has pending i/o waiting.
>>>>
>>>>sqSocketListenOnPortBacklogSize has set up the socket with a listen
>>>>and then enabled an aio handler.
>>>>
>>>>TcpListener has created a socket with a backlog default of 10.
>>>>
>>>>This implies that every 500 milisecs (ie, 2 times per sec) we can
>>>>get 10 accepts for 20 connections per sec when processing
>>>>is happening.
>>>>
>>>>On top of that events are polled when we call
>>>>relinquishProcessorForMicroseconds which is called by the idle
>>>>process. This would be called with a wait of 1 milisec
>>>>(1000 microsecs). I'm not sure how often this is called and it
>>>>depends on the load in Squeak as well.
>>>>
>>>>I guess then that the additional 30 connections per sec
>>>>would be accepted during the idle process running.
>>>>
>>>>It would seem then that a higher backlog in TcpListener might
>>>>give higher connections per second. If someone has a test bed
>>>>set up which shows the 50 connections per sec, what happens if
>>>>you doIt
>>>>
>>>>TcpListener backLogSize: 1000
>>>>
>>>>do things work? Do you get more connections pe sec? Do things
>>>>hang?
>>>>
>>>>I'm obviously not sure the above is correct. With luck someone
>>>>with a chance of knowing will speak up :-)
>>>>
>>>>cheers
>>>>
>>>>bruce
>>>>_______________________________________________
>>>>Seaside mailing list
>>>>Seaside at lists.squeakfoundation.org
>>>>http://lists.squeakfoundation.org/listinfo/seaside
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>_______________________________________________
>>Seaside mailing list
>>Seaside at lists.squeakfoundation.org
>>http://lists.squeakfoundation.org/listinfo/seaside
>>
>>
>>
>
>_______________________________________________
>Seaside mailing list
>Seaside at lists.squeakfoundation.org
>http://lists.squeakfoundation.org/listinfo/seaside
>
>
More information about the Squeak-dev
mailing list
|