[Seaside] Squeak async i/o performrance, was Re: HTTP Performance

Stephen Pair stephen at pairhome.net
Thu Nov 20 16:01:37 UTC 2003


Andreas Raab wrote:

>>This test really only measures socket connection performance.
>>    
>>
>
>Which is 99% bounded by the interface between VM and image. Precisely the
>thing that I wanted to measure. So if there's latency between VM and image
>then this latency should become visible here (note that both connect as well
>as accept require state changes on the socket).
>
>  
>
>>My guess is that something is causing an excessive latency between
>>the time a socket semaphore is marked for signaling in the VM and
>>when the waiting squeak process actually wakes up.
>>    
>>
>
>That's possible but not likely according to the results I got. First of all,
>the results show that no matter how hard you try you can't get above 1k
>connections per second (bounded by the time it takes to create and destroy a
>socket). Secondly, if there were latency involved, then it should be visible
>by reduced CPU usage - however, the CPU maxes out during the test which is a
>clear indication that Squeak is doing "something". 
>

1000 connections per seconds may indeed be an upper bound...however, it 
may not be an upper bound for http requests per second (remember 1 http 
request ~= 1 socket connection).  And, in may tests, I do show 
significant idle time.

I bumped the comanche server priority to 80 and observed one 'ab' test 
get 1477 rps (using: ab -n 1024 -c 16 http://10.0.0.181:8080/)...this 
test should only open 16 socket connections (my comanche server is set 
to allow up to 100 http requests on a single socket).  Also, this test 
is the "hello world" test that only sends back the in memory "hello 
world" string (so no file I/O involved).  However, this result is not 
consist, running it over and over, I sometime get as few as 100 rps and 
usually get in the neighborhood of 200-300 rps.  This is running ab from 
a 1ghz pIII running Linux to a 1.8ghz Windows machine.

But, the cool thing is that about 1 in every 10 or 11 attempts I get in 
the range of 1000 to 1500 http RPS!  So, the question is, how come I 
can't get that on every attempt.  Why the erratic numbers?  Another 
interesting data point is that when I get a high RPS, I'm also getting a 
high number of failed requests.  One case where I achieved 1154 rps, I 
had 1024 completed requests (of course) and 1943 failed requests.  Also, 
adjusting the socket buffers sizes and backlog size seemed to have 
little (if any) impact).

>Considering that during
>this test, the script is run in the main process there is no UI
>interference. Given that there were no other "interesting" processes active
>it is reasonable to assume that all of the (maxed out CPU) time is spent
>within Squeak. Which means within the primitives observed by the tally which
>means that latency is probably not the limiting factor. Q.E.D. ;-)
>  
>

Which I'm not doubting (regarding the connection tests)...I think the 
problem is more related to a latency associated with the read/write 
semaphores...and perhaps this is a latency in the socket interface code 
to mark these semaphores for signaling.  Who knows.  I'm baffled.

- Stephen

>And of course, all of this is on Windows. YMMV.
>
>Cheers,
>  - Andreas
>
>  
>
>>-----Original Message-----
>>From: seaside-bounces at lists.squeakfoundation.org 
>>[mailto:seaside-bounces at lists.squeakfoundation.org] On Behalf 
>>Of Stephen Pair
>>Sent: Thursday, November 20, 2003 3:53 PM
>>To: The general-purpose Squeak developers list
>>Cc: 'The Squeak Enterprise Aubergines Server - general discussion.'
>>Subject: Re: [Seaside] Squeak async i/o performrance, was Re: 
>>HTTP Performance
>>
>>
>>This test really only measures socket connection performance.  RPS as 
>>measured by ab (apache benchmark tool) is a full HttpRequest and 
>>HttpResponse.  So they are quite different things than what this test 
>>measures.  The ab tests we were running only open a relatively few 
>>socket connections (I was using 4, 8, 16, 32, etc).  The number of 
>>simultaneous socket connections seems to have little effect 
>>on the HTTP 
>>RPS throughput.  Measuring HTTP rps on both inbound connections (to 
>>comanche) and outbound (to an apache server) yield similar results in 
>>terms of RPS...even though ab can hit apache with over 1000 
>>RPS, squeak 
>>can only hit apache with ~50 RPS...and, we seem to spend a 
>>lot of time 
>>(~70%) during these benchmarks waiting on socket semaphores.  
>>My guess 
>>is that something is causing an excessive latency between the time a 
>>socket semaphore is marked for signaling in the VM and when 
>>the waiting 
>>squeak process actually wakes up.
>>
>>- Stephen
>>
>>Andreas Raab wrote:
>>
>>    
>>
>>>Hi Bruce,
>>>
>>> 
>>>
>>>      
>>>
>>>>I poked at this today and I don't quite get the math to work.
>>>>   
>>>>
>>>>        
>>>>
>>>Interesting you should say this. My impression after running 
>>>      
>>>
>>a bunch of
>>    
>>
>>>tests with Avi and Stephen yesterday was very similar. I can 
>>>      
>>>
>>say for sure
>>    
>>
>>>that there's something wrong here but it's not quite clear 
>>>      
>>>
>>what. I've been
>>    
>>
>>>running an interesting little benchmark which might be good 
>>>      
>>>
>>to run on other
>>    
>>
>>>machines as well. Here it is:
>>>
>>>"--- the server (run in one image) -----"
>>>server := Socket newTCP.
>>>server listenOn: 12345 backlogSize: 4.
>>>
>>>"wait for the first connection (this allows us to profile both ends)"
>>>[client := server waitForAcceptFor: 10 ifTimedOut:[nil].
>>>client == nil] whileTrue.
>>>client destroy.
>>>
>>>"Now do 1k connections"
>>>MessageTally spyOn:[
>>>	1 to: 1000 do:[:i|
>>>		client := server waitForAcceptFor: 10 ifTimedOut:[nil].
>>>		client destroy.
>>>	].
>>>].
>>>"clean up"
>>>server destroy.
>>>
>>>
>>>"--- the client (run in another one) ---"
>>>addr := NetNameResolver addressForName: '127.0.0.1'.
>>>
>>>"wait for the first connection (to sync with server)"
>>>[client := Socket newTCP.
>>>[client connectTo: addr port: 12345] on: ConnectionTimedOut 
>>>      
>>>
>>do:[:ex| ex
>>    
>>
>>>return].
>>>client isConnected] whileFalse:[client destroy].
>>>
>>>"Now do 1k connections"
>>>MessageTally spyOn:[
>>>	client := Socket newTCP.
>>>	1 to: 1000 do:[:i|
>>>		client := Socket newTCP.
>>>		client connectTo: addr port: 12345.
>>>		client destroy.
>>>	].
>>>].
>>>
>>>Running this shows profiles on both server and client side as well as
>>>(indirectly) the number of connections established. It turns 
>>>      
>>>
>>out that this
>>    
>>
>>>runs in ~4secs on my machine which makes for about 250 
>>>      
>>>
>>connections per
>>    
>>
>>>second. Looking closer, it looked as if much of the time is 
>>>      
>>>
>>spent in socket
>>    
>>
>>>creation and destroy (which admittedly is a heavy-weight operation on
>>>Windows) and measuring it shows this clearly:
>>>
>>>[1 to: 1000 do:[:i| Socket newTCP destroy]] timeToRun
>>>
>>>This takes about a second (after some warmup time ;) which 
>>>      
>>>
>>means that we
>>    
>>
>>>need about 1ms for creating and destroying a single socket. 
>>>      
>>>
>>Since the above
>>    
>>
>>>was run on a single machine it means that 50% of the time in 
>>>      
>>>
>>the benchmark
>>    
>>
>>>is spent in socket creation/destroy (2k sockets used which 
>>>      
>>>
>>means 2secs
>>    
>>
>>>overhead out of the 4secs the benchmark took).
>>>
>>>Definitely something to improve but it still doesn't say why Comanche
>>>doesn't get more than 50rps.
>>>
>>>Cheers,
>>> - Andreas
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: seaside-bounces at lists.squeakfoundation.org 
>>>>[mailto:seaside-bounces at lists.squeakfoundation.org] On Behalf 
>>>>Of Bruce ONeel
>>>>Sent: Thursday, November 20, 2003 1:58 PM
>>>>To: The Squeak Enterprise Aubergines Server - general discussion.
>>>>Cc: squeak-dev at lists.squeakfoundation.org
>>>>Subject: [Seaside] Squeak async i/o performrance, was Re: 
>>>>HTTP Performance
>>>>
>>>>
>>>>Avi Bryant <avi at beta4.com> wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Like Stephen, I'm intrigued by your 50rps figure for 
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>Comanche - that's 
>>>>   
>>>>
>>>>        
>>>>
>>>>>the number that I always seem to come up with as well.  
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>There *must* be 
>>>>   
>>>>
>>>>        
>>>>
>>>>>something throttling that, and we need to get to the bottom of it.
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>Hi,
>>>>
>>>>I poked at this today and I don't quite get the math to work.
>>>>I looked at unix Squeak 3.4-1 so that might distort things.
>>>>
>>>>The basis of the clock in interp.c seems to be in milisecs.  I'm
>>>>getting idea from comments and the definition of ioiMsecs in 
>>>>sqXWindow.c.  So, given that:
>>>>
>>>>checkForInterrupts (interp.c) is called approx every 1000 byte codes
>>>>executed.  If it hasn't been 3 milisecs since the last call this
>>>>1000 is scalled up.   If it's been more it is scalled down.  We seem
>>>>to be trying to check for interrupts every 3 milisecs.  This is
>>>>controlled by interruptChecksEveryNms.
>>>>
>>>>Once this happens if it has been 500 milisecs (this seems high)
>>>>after the last time we called checkForInterrupts we call
>>>>ioProcessEvents().
>>>>This comes from the bits in checkForInterrupts()
>>>>if (now >= nextPollTick) {
>>>>	ioProcessEvents();
>>>>	nextPollTick = now + 500;
>>>>}
>>>>
>>>>ioProcessEvents (sqXWindow.c)  calls aioPoll.
>>>>
>>>>aioPoll (aio.c) calls select and then calls the aio handler for 
>>>>each descriptor which has pending i/o waiting.
>>>>
>>>>sqSocketListenOnPortBacklogSize has set up the socket with a listen
>>>>and then enabled an aio handler.
>>>>
>>>>TcpListener has created a socket with a backlog default of 10.
>>>>
>>>>This implies that every 500 milisecs (ie, 2 times per sec) we can 
>>>>get 10 accepts for 20 connections per sec when processing
>>>>is happening.
>>>>
>>>>On top of that events are polled when we call 
>>>>relinquishProcessorForMicroseconds which is called by the idle
>>>>process.  This would be called with a wait of 1 milisec 
>>>>(1000 microsecs).  I'm not sure how often this is called and it
>>>>depends on the load in Squeak as well.
>>>>
>>>>I guess then that the additional 30 connections per sec
>>>>would be accepted during the idle process running.
>>>>
>>>>It would seem then that a higher backlog in TcpListener might
>>>>give higher connections per second.  If someone has a test bed
>>>>set up which shows the 50 connections per sec, what happens if
>>>>you doIt
>>>>
>>>>TcpListener backLogSize: 1000
>>>>
>>>>do things work?  Do you get more connections pe sec?  Do things
>>>>hang?  
>>>>
>>>>I'm obviously not sure the above is correct.  With luck someone
>>>>with a chance of knowing will speak up :-)
>>>>
>>>>cheers
>>>>
>>>>bruce
>>>>_______________________________________________
>>>>Seaside mailing list
>>>>Seaside at lists.squeakfoundation.org
>>>>http://lists.squeakfoundation.org/listinfo/seaside
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Seaside mailing list
>>Seaside at lists.squeakfoundation.org
>>http://lists.squeakfoundation.org/listinfo/seaside
>>
>>    
>>
>
>_______________________________________________
>Seaside mailing list
>Seaside at lists.squeakfoundation.org
>http://lists.squeakfoundation.org/listinfo/seaside
>  
>





More information about the Squeak-dev mailing list