[Seaside] Squeak async i/o performrance, was Re: HTTP Performance

Thu Nov 20 16:12:25 UTC 2003

Hmm, I poked for a while but I didn't find enlightment yet.

I'll poke some more.

THanks!

cheers

bruce

Stephen Pair <stephen at pairhome.net> wrote:
> Date: Thu, 20 Nov 2003 09:53:23 -0500
> From: Stephen Pair <stephen at pairhome.net>
> Subject: Re: [Seaside] Squeak async i/o performrance, was Re: HTTP Performance
> To: The general-purpose Squeak developers list <squeak-dev at lists.squeakfoundation.org>
> Cc: "'The Squeak Enterprise Aubergines Server - general discussion.'" <seaside at lists.squeakfoundation.org>
> reply-to: "The Squeak Enterprise Aubergines Server - general discussion." <seaside at lists.squeakfoundation.org>
> 
> This test really only measures socket connection performance.  RPS as 
> measured by ab (apache benchmark tool) is a full HttpRequest and 
> HttpResponse.  So they are quite different things than what this test 
> measures.  The ab tests we were running only open a relatively few 
> socket connections (I was using 4, 8, 16, 32, etc).  The number of 
> simultaneous socket connections seems to have little effect on the HTTP 
> RPS throughput.  Measuring HTTP rps on both inbound connections (to 
> comanche) and outbound (to an apache server) yield similar results in 
> terms of RPS...even though ab can hit apache with over 1000 RPS, squeak 
> can only hit apache with ~50 RPS...and, we seem to spend a lot of time 
> (~70%) during these benchmarks waiting on socket semaphores.  My guess 
> is that something is causing an excessive latency between the time a 
> socket semaphore is marked for signaling in the VM and when the waiting 
> squeak process actually wakes up.
> 
> - Stephen
> 
> Andreas Raab wrote:
> 
> >Hi Bruce,
> >
> >  
> >
> >>I poked at this today and I don't quite get the math to work.
> >>    
> >>
> >
> >Interesting you should say this. My impression after running a bunch of
> >tests with Avi and Stephen yesterday was very similar. I can say for sure
> >that there's something wrong here but it's not quite clear what. I've been
> >running an interesting little benchmark which might be good to run on other
> >machines as well. Here it is:
> >
> >"--- the server (run in one image) -----"
> >server := Socket newTCP.
> >server listenOn: 12345 backlogSize: 4.
> >
> >"wait for the first connection (this allows us to profile both ends)"
> >[client := server waitForAcceptFor: 10 ifTimedOut:[nil].
> >client == nil] whileTrue.
> >client destroy.
> >
> >"Now do 1k connections"
> >MessageTally spyOn:[
> >	1 to: 1000 do:[:i|
> >		client := server waitForAcceptFor: 10 ifTimedOut:[nil].
> >		client destroy.
> >	].
> >].
> >"clean up"
> >server destroy.
> >
> >
> >"--- the client (run in another one) ---"
> >addr := NetNameResolver addressForName: '127.0.0.1'.
> >
> >"wait for the first connection (to sync with server)"
> >[client := Socket newTCP.
> >[client connectTo: addr port: 12345] on: ConnectionTimedOut do:[:ex| ex
> >return].
> >client isConnected] whileFalse:[client destroy].
> >
> >"Now do 1k connections"
> >MessageTally spyOn:[
> >	client := Socket newTCP.
> >	1 to: 1000 do:[:i|
> >		client := Socket newTCP.
> >		client connectTo: addr port: 12345.
> >		client destroy.
> >	].
> >].
> >
> >Running this shows profiles on both server and client side as well as
> >(indirectly) the number of connections established. It turns out that this
> >runs in ~4secs on my machine which makes for about 250 connections per
> >second. Looking closer, it looked as if much of the time is spent in socket
> >creation and destroy (which admittedly is a heavy-weight operation on
> >Windows) and measuring it shows this clearly:
> >
> >[1 to: 1000 do:[:i| Socket newTCP destroy]] timeToRun
> >
> >This takes about a second (after some warmup time ;) which means that we
> >need about 1ms for creating and destroying a single socket. Since the above
> >was run on a single machine it means that 50% of the time in the benchmark
> >is spent in socket creation/destroy (2k sockets used which means 2secs
> >overhead out of the 4secs the benchmark took).
> >
> >Definitely something to improve but it still doesn't say why Comanche
> >doesn't get more than 50rps.
> >
> >Cheers,
> >  - Andreas
> >
> >  
> >
> >>-----Original Message-----
> >>From: seaside-bounces at lists.squeakfoundation.org 
> >>[mailto:seaside-bounces at lists.squeakfoundation.org] On Behalf 
> >>Of Bruce ONeel
> >>Sent: Thursday, November 20, 2003 1:58 PM
> >>To: The Squeak Enterprise Aubergines Server - general discussion.
> >>Cc: squeak-dev at lists.squeakfoundation.org
> >>Subject: [Seaside] Squeak async i/o performrance, was Re: 
> >>HTTP Performance
> >>
> >>
> >>Avi Bryant <avi at beta4.com> wrote:
> >>
> >>    
> >>
> >>>Like Stephen, I'm intrigued by your 50rps figure for 
> >>>      
> >>>
> >>Comanche - that's 
> >>    
> >>
> >>>the number that I always seem to come up with as well.  
> >>>      
> >>>
> >>There *must* be 
> >>    
> >>
> >>>something throttling that, and we need to get to the bottom of it.
> >>>
> >>>      
> >>>
> >>Hi,
> >>
> >>I poked at this today and I don't quite get the math to work.
> >>I looked at unix Squeak 3.4-1 so that might distort things.
> >>
> >>The basis of the clock in interp.c seems to be in milisecs.  I'm
> >>getting idea from comments and the definition of ioiMsecs in 
> >>sqXWindow.c.  So, given that:
> >>
> >>checkForInterrupts (interp.c) is called approx every 1000 byte codes
> >>executed.  If it hasn't been 3 milisecs since the last call this
> >>1000 is scalled up.   If it's been more it is scalled down.  We seem
> >>to be trying to check for interrupts every 3 milisecs.  This is
> >>controlled by interruptChecksEveryNms.
> >>
> >>Once this happens if it has been 500 milisecs (this seems high)
> >>after the last time we called checkForInterrupts we call
> >>ioProcessEvents().
> >>This comes from the bits in checkForInterrupts()
> >>if (now >= nextPollTick) {
> >>	ioProcessEvents();
> >>	nextPollTick = now + 500;
> >>}
> >>
> >>ioProcessEvents (sqXWindow.c)  calls aioPoll.
> >>
> >>aioPoll (aio.c) calls select and then calls the aio handler for 
> >>each descriptor which has pending i/o waiting.
> >>
> >>sqSocketListenOnPortBacklogSize has set up the socket with a listen
> >>and then enabled an aio handler.
> >>
> >>TcpListener has created a socket with a backlog default of 10.
> >>
> >>This implies that every 500 milisecs (ie, 2 times per sec) we can 
> >>get 10 accepts for 20 connections per sec when processing
> >>is happening.
> >>
> >>On top of that events are polled when we call 
> >>relinquishProcessorForMicroseconds which is called by the idle
> >>process.  This would be called with a wait of 1 milisec 
> >>(1000 microsecs).  I'm not sure how often this is called and it
> >>depends on the load in Squeak as well.
> >>
> >>I guess then that the additional 30 connections per sec
> >>would be accepted during the idle process running.
> >>
> >>It would seem then that a higher backlog in TcpListener might
> >>give higher connections per second.  If someone has a test bed
> >>set up which shows the 50 connections per sec, what happens if
> >>you doIt
> >>
> >>TcpListener backLogSize: 1000
> >>
> >>do things work?  Do you get more connections pe sec?  Do things
> >>hang?  
> >>
> >>I'm obviously not sure the above is correct.  With luck someone
> >>with a chance of knowing will speak up :-)
> >>
> >>cheers
> >>
> >>bruce
> >>_______________________________________________
> >>Seaside mailing list
> >>Seaside at lists.squeakfoundation.org
> >>http://lists.squeakfoundation.org/listinfo/seaside
> >>
> >>    
> >>
> >
> >
> >  
> >
> 
> 
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/listinfo/seaside