[Seaside] Apache frontend for Squeak, mod_scgi ?

Sat Sep 27 01:06:49 CEST 2003

Hello Colin, all.

I want to make it clear that I am very pro Squeak.
But I need to make the best decision for me, for this time, for this 
situation. The users here may make a different but equally valid 
decision for themselves building the same website. I can only go with my 
knowledge and understanding and skills.

Below are the facts as best I understand them.
Nothing is meant to be or sound like an advertisement or endorsement of 
Python over Squeak. I may be doing something wrong.

Colin Putney wrote:
> On Friday, September 26, 2003, at 01:23 PM, Jimmie Houchin wrote:
>> Avi Bryant wrote:
[snip]
>> This is the part I don't understand. I am not expert on any of this.
>> All I know is that:
>>
>> Apache helloworld25k.html = 800 rps
>> Comanche helloworld25k.html = 90 rps
>> Apache, mod_python helloworld25k.py = 390 rps
>>     (script opening the 25k file and serving it to Apache)
>> Medusa (python web server)helloworld25k.html = 25 rps
>> This is all from memory and not necessarily totally accurate.

Okay, I'm back at home. Time for real numbers, not faulty memory. :)

Machine: Gentoo Linux, Apache 2.0.47, mod_python 3.1, Squeak 3.6g2, 
squeak-5423.image, Kom 6.2 hot off the press. :)
The VM and image were fresh downloads and compiles expressly for this test.

test: httperf --hog --num-conn 10000 --server ?? --port ?? --uri=/??
?? = correct values.

Apache hello25k.html	550 rps   totalCPU 27%
Komanche hello25k.html  8.5 rps   totalCPU 53%
A2-mod_python args.py   155 rps   totalCPU 12%
A2-mod_python mptest.py	348 rps   totalCPU 14%
Komanche dynamic***1    8.3 rps   totalCPU 48%
Komanche dynamic***2    29  rps   totalCPU 62%

Interesting. mod_python used less CPU than Apache serving it statically.
And the static file was 5k less than the mod_python test which was 30k.

mptest.py merely serves a 'Hello World!!!!!!!' string.
args.py opens,reads,closes 11 files, writes the data plus reads/writes 
environment variables to the request for a total of 30k.

The Apache, mod_python setup is no more optimized than the Squeak setup.
All of them are default installs.

According to the mod_python page the author gets 1200 rps from ab on a 
1200 mhz Pentium serving a 'hello' string dynamically with a certain 
mod_python handler. So optimizing is definitely available.

> Ah, but these are all tests of performance at serving static pages. Of 
> course Apache will be king here, it's straight C-code, highly tuned for 
> doing exactly that. As soon as you introduce dynamic content, it's whole 
> different ball of wax.

Yes.

>> I don't intend for Apache to serve any static files. I'll use Tux for 
>> static files and images. My only desire for Apache is to improve 
>> dynamic requests. If Squeak/Comanche could equal or better 
>> Apache/mod_python in performance on dynamic pages. I would leave 
>> Apache alone.
> 
> In this case, as Avi mentioned, putting Apache in front of Squeak will 
> only slow the whole process down. Assuming, of course, that you only 
> have one Squeak server. If you've got a cluster of Squeak servers and 
> you're using Apache for load balancing, you could probably get good 
> performance under high load, although the time it takes to process a 
> given request wouldn't improve.

Okay.

>> If I am misunderstanding my experience or doing something wrong I 
>> don't know. I am open too that. I don't think Apache mod_python are 
>> doing any caching. The script opens, reads and closes the file. Python 
>> is persistent, long-lived, but the script should still execute the 
>> open, read, close every time. Just a modest attempt at a simple 
>> dynamic response.
> 
> I think the reason your mod_python test is so much faster than Comanche 
> or Medusa is because it spends most of it's time in C code. I'll bet the 
> python code to open and output a file is only a couple of lines, am I 
> right?

That is correct.
file = open(/path/to/file)
req.write(file.read())

Nevertheless it still has to open and read it into the vm before writing 
it out to the request.

The Squeak test above was with an internal variable already available 
inside of Squeak. No opening, no reading, just spit it out.

> The difference between this and what you'll get with Quixote probably 
> quite significant. With Quixote, you'll incur the overhead of 
> interprocess communication between Apache and your python process. With 
> mod_python, you don't get this overhead, because python is running 
> inside your Apache process.

Actually Jon's experience with LWN (Linux Weekly News) was improved 
performance when switching from mod_python to mod_scgi.
http://www.lwn.net

> I would expect Apache + mod_scgi + python server to be slower than 
> straight Commanche.

I would to but I'm not experiencing it. :(

> I suspect that you're attempting to optimize prematurely. Ultimately the 
> performance of your site will probably depend more on how you generate 
> your dynamic content than on what language or HTTP server you use. If 
> your database has 10s of millions of records, you're app will probably 
> spend more time waiting for data than parsing requests.

Actually I'm not really attempting to optimize at all. I'm merely 
attempting to gather some preliminary data to make a decision.

If Squeak doing the simplest request is magnitudes slower than 
mod_python doing a much more complex request, then it makes it that much 
harder for me to choose Squeak. This is not to say that Squeak isn't an 
excellent choice for other apps/webapps.

You are probably correct that querying the db will be the more intensive 
process. Nevertheless, with the results above, mod_python takes less 
time and uses less resources while taking the less time which leaves 
more time and more resources to do the other tasks. Whew, I hope that 
came out clear. :)

> My best advice would be to worry more about writing your app than tuning 
> it for right now. Once you get it up and running you can measure its 
> performance, profile it, and find the most effective ways to get the 
> performance you need. And that's the goal, right? "As fast as 
> necessary," not "as fast as possible." It may be that you can get away 
> with straight Comanche to start, and then ramp up performance as your 
> user base grows.

Currently I actually understand Apache, mod_****, Quixote better than 
writing Comanche modules. My Python is as good or better than my 
Smalltalk. I just happen to like Smalltalk better.

So currently for me, by me Apache, mod_****, Quixote is faster up and 
faster running.

I'm hoping for a fast start. If I understand the market I'm entering 
like I think I do... And if I have sufficient value over my competition, 
like I think I will... I won't have much time to ramp up. It needs to be 
ready. Only deployment will tell the tale between my hopes, dreams, 
ambitions and reality. :)

> If it were me, I'd write the app in Seaside/Squeak, knowing that to 
> dramatically increase performance, I have many options:
>     - create a cluster of Squeak servers, and put a load balancer in 
> front of it
>     - write a VM plugin to optimize critical sections of code
>     - port to VisualWorks and get a 10x speed increase from JIT compilation

VisualWorks is way too expensive. For the $1000s it would cost I would 
rather put that into Squeak. If I had it $ available, which I don't. But 
do hope to after the site is up. :)

Jimmie

Squeak code ***
These were run in a Workspace.
This was copied from the Class documentation and modified for the test.

***1
| ma |
ma _ ModuleAssembly core.
ma addPlug:
	[ :request |
	HttpResponse fromString: JLHKomServer K25].
(HttpService startOn: 8080 named: 'JLHKS') module: ma rootModule.
  "JLHKomServer K25 is a class variable holding the 25k string"

***2
| ma2 |
ma2 _ ModuleAssembly core.
ma2 addPlug:
	[ :request |
	HttpResponse fromString: 'Hello World!'].
(HttpService startOn: 8888 named: 'JLHKS') module: ma2 rootModule.

If I am doing something wrong which is making Squeak look bad, please 
correct me. It is not my intention.