My apologies for a general Squeak web question. But it seems this is where most of the web developers for Squeak are. :)
I am currently building a website. I would like to use Squeak but am concerned about performance and stability. Due to that I am seriously (reasonably so) considering using Apache and Python, Quixote to be specific. http://www.mems-exchange.org/software/quixote/
As the sole implementor of this site I like the philosophy of Quixote.
Much of the performance of Quixote is from using Apache as the webserver instead of a Python based server. The most popular means of achieving that is via mod_scgi. http://www.mems-exchange.org/software/scgi/ Version 1.2a is available for Apache 1.3 and 2.0.4x http://www.mems-exchange.org/software/files/scgi/
This is the platform LWN uses and achieves good performance, handles Slashdotting, etc.
I would like that for Squeak.
From my understanding there is 2 parts to SCGI. The Apache module and the client-side code in this case called scgi_server.py. It is only a 268 line file including comments (in the 1.1 version). It appears to be using Netstrings of which Stephen Pair has a Squeak implementation.
I believe something like this would help Squeak in web development. Unfortunately I am not currently qualified to port this to Squeak. It might not take someone who is knowledgeable very long to implement. I would fund it but my funds are not much. I am currently funding my server (purchasing next week) and deployment costs. When my site is profitable I could fund it, but then it would be after the fact. However, I could offer a small amount toward such, with it released to the community.
I think Apache, mod_scgi, and Squeak could outperform Apache, mod_scgi, and Python. I don't know how this compares to mod_lisp. But mod_lisp also isn't available for Apache2 to my knowledge.
Thoughts, opinions, offers, takers?
Thanks.
Jimmie Houchin
On Friday, September 26, 2003, at 10:05 AM, Jimmie Houchin wrote:
My apologies for a general Squeak web question. But it seems this is where most of the web developers for Squeak are. :)
I am currently building a website. I would like to use Squeak but am concerned about performance and stability. Due to that I am seriously (reasonably so) considering using Apache and Python, Quixote to be specific. http://www.mems-exchange.org/software/quixote/
Hi Jimmie,
Can you tell us more about what type of site you're building, and what your concerns are?
I've had good results using Apache + mod_proxy + Comanche + Seaside. That's useful in a situation where only *part* of the website needs to be dynamic, and the rest can be served statically by Apache. Does this fit your situation?
Are you concerned that Squeak will crash and need to be restarted? Does mod_scgi automatically restart CGI services?
Do you expect such a high load that you'll need multiple apache servers or some other high end load balancing scheme?
By the way, I wasn't able to read the documentation on the SCGI protocol. The domain it's hosted at seems to have expired or something. I'm sure it would be possible to implement an SCGI server for Seaside, although should say that I had a look at implementing FastCGI, and decided it wasn't worth the effort.
Colin
On Fri, 26 Sep 2003, Colin Putney wrote:
By the way, I wasn't able to read the documentation on the SCGI protocol. The domain it's hosted at seems to have expired or something.
http://www.google.ca/search?q=cache:dwYgiHwWjSkJ:python.ca/nas/scgi/protocol...
Hello Colin,
Colin Putney wrote:
On Friday, September 26, 2003, at 10:05 AM, Jimmie Houchin wrote:
My apologies for a general Squeak web question. But it seems this is where most of the web developers for Squeak are. :)
I am currently building a website. I would like to use Squeak but am concerned about performance and stability. Due to that I am seriously (reasonably so) considering using Apache and Python, Quixote to be specific. http://www.mems-exchange.org/software/quixote/
Hi Jimmie,
Can you tell us more about what type of site you're building, and what your concerns are?
I've had good results using Apache + mod_proxy + Comanche + Seaside. That's useful in a situation where only *part* of the website needs to be dynamic, and the rest can be served statically by Apache. Does this fit your situation?
I can't go into details about the site yet. But it will be an ecommerce type site. Possibly have ads sometime in the future. Amazon is a reasonable analogy. Users, with preferences, much of the content reasonably static but interspersed with dynamic pieces.
I don't really know what mod_proxy does and how it helps with Squeak webapps.
Are you concerned that Squeak will crash and need to be restarted? Does mod_scgi automatically restart CGI services?
I am not really worried about Squeak crashing. I've not really experience Squeak behaving as such, but I've never placed it under a real heavy load as such.
Do you expect such a high load that you'll need multiple apache servers or some other high end load balancing scheme?
Performance is a major concern. Apache2 on my home (development) machine, Gentoo, Athlon 700mhz, 1.25gb ram, delivers about 800 rps on a 25k file. From memory Squeak/Commanche (kom 6, squeak 3.5) gave about maybe 90 rps.
Apache2, mod_python 3, I've gotten between 150 and 400 rps for dynamic pages. mod_python serving the identical 25k page was at about 400. mod_python running a script which opened 12 different small files, read the contents, plus the 25k file above, and some strings in the script and the files closed, serve all the above to the client each and every request, served at about 190 rps.
That is acceptable to me. How can I achieve that with Squeak? Can I currently?
By the way, I wasn't able to read the documentation on the SCGI protocol. The domain it's hosted at seems to have expired or something. I'm sure it would be possible to implement an SCGI server for Seaside, although should say that I had a look at implementing FastCGI, and decided it wasn't worth the effort.
The author of mod_scgi originally had a fastcgi module. He said it was too complex and too much trouble. That's why he developed scgi.
Back too the site.
I am currently planning on using MySQL for the database. It will have 10s of millions of objects.
From researching similar* websites I am hoping for up to 1m+ page views a day at some point. (* I use similar loosely) I've wanted this website for 5 years and it still doesn't exist. I've slowly been working on the skills in my spare time to be able to develop my ideas. We'll see if anybody else shares my vision when its up. ;)
I would really like if Apache2 could cache the static portions of a page and request and insert the dynamic portions from Squeak. I don't think that is currently an option. I don't know.
Should I study up on mod_proxy? Will it help?
Jimmie
Jimmie Houchin seaside@lists.squeakfoundation.org said:
Performance is a major concern. Apache2 on my home (development) machine, Gentoo, Athlon 700mhz, 1.25gb ram, delivers about 800 rps on a 25k file. From memory Squeak/Commanche (kom 6, squeak 3.5) gave about maybe 90 rps.
I recall that 90 rps was considered extremely good back when I was being a kewl dude in the dotcom boom. It'll allow you to serve some 3 million hits per day without getting into real trouble. And of course, you're not going to deploy on a lousy 700Mhz Athlon anyway, so double that number.
Now, if I'd had an ecommerce site that was generating money (this is post dotcom boom, so I'm just going to take a giant leap here and assume that these 3 million hits will actually help you to make money), I'd look into a load-balancing solution anyway. Never bet your cool site that makes lots of money on a single box. Rent a rack, and start with 8 machines - 2 LVS loadbalancers that monitor each other with Heartbeat, 2 first-tier Apache webservers, 2 second-tier Squeak appservers (maybe get creative here with the handback module?), and 2 database backend servers that are replicated with NBD or MySQL's replication stuff, monitoring each other with Heartbeat. You'd want the database machines to be SCSI-based, with a mirrored solution RAID is optional but could help performance. Oh, and shell out for some Xeons, the added cache really helps.
That, in my not so humble opinion, constitutes a Serious e-Commerce Setup[tm].
If Squeak becomes a bottleneck, go to Rackservers'r'Us and click in another box. Who cares. You don't win by saving money on hardware (make the calculation: 1x US$1000 rackserver, depreciated over 36 months - wew, that'll hit your books with a full 28 dollars per month!), you win by being better at developing the kind of services your customers want. If you don't believe that, stay out of business. If you do believe that, don't be preoccupied with performance figures but choose whatever development environment you think will give you the best site, fastest.
That, in my not so humble opinion, constitutes a Serious e-Commerce Setup[tm].
Cees is right on the money here... I've deployed several setups like this for customers... the return on investment is excellent, and no one complains about uptime ;) Granted, this was before I knew about Squeak/Seaside, but I would have zero issues with deploying an app based on Seaside in a high volume environment. More of your issues will come from the database, and how your networking/load-balancing/fault-tolerance issues are setup. The ease of developing complex web apps in Seaside vs. anything I've used before overcomes any other consideration in a dynamic environment. And believe me, I was a dyed-in-the-wool python evangelist... cgi/ mod_python/zope/albatross/pythonscript (asp). Having an image with everything in it is *much* easier to deploy than a filesystem based distribution with 3rd-party modules (which I always wound up using, as well as one's I wrote myself)
As a side question, Jimmie; are you starting up squeak with the -memory option? I have noticed huge performance differences by just firing up the interpreter with it and without it. Also, which VM are you using and are you running on Linux/FreeBSD/ or windows? Have you also stripped an image of extra classes and fluff?
GnuPG 1024D/E0989E8B 0016 F679 F38D 5946 4ECD 1986 F303 937F E098 9E8B Cogito ergo evigilo
Seaside mailing list Seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/listinfo/seaside
Hello again.
Colin Putney wrote: [snip]
Are you concerned that Squeak will crash and need to be restarted? Does mod_scgi automatically restart CGI services?
This post by Lukas Renggli also gave me pause for concern. http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-September/066334...
I hope he got that problem resolved, if so it wasn't mentioned on the list to my memory.
Jimmie
On Fri, 26 Sep 2003, Jimmie Houchin wrote:
This post by Lukas Renggli also gave me pause for concern. http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-September/066334...
I hope he got that problem resolved, if so it wasn't mentioned on the list to my memory.
The problem was resolved. IIUC it didn't turn out to be a problem with Squeak itself, but rather with a tricky race condition in the application code.
Lukas, is that correct?
Avi
This post by Lukas Renggli also gave me pause for concern.
http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-September/
066334.html
I hope he got that problem resolved, if so it wasn't mentioned on the list to my memory.
The problem was resolved. IIUC it didn't turn out to be a problem with Squeak itself, but rather with a tricky race condition in the application code.
Lukas, is that correct?
The last week has been very calm, we didn't get error-messages anymore and the application itself run smoothly. However we had some crashes of the VM ...
So let me start at the very beginning: First of all, I set the VM parameter -notimer and at first it seemed to help a lot, but on the long run the problem didn't disappear.
Last weekend we could take the server in our office, as they were changing something at the customers site and the network would have been down anyway. So I had full access to that machine for the first time and I even managed to write some stress-tests that showed a problem. Well, we are not sure if this was the same problem as on the server, but it was a severe problem. So I rewrote the complete persistence-mechanism of our application and carefully protected any write-access to postgres with critical sections. The stress-test did run then and up-to-now we haven't seen our socket problem anymore.
Well, but there is still that strange feeling not knowing what the problem actually was. Very probably Avi is right and it was some race-condition in our application code. Or it could have been the buggy SharedQueue that we are using to pool our postgres-sockets. I was shocked when reading Nathanael's mail about those synchronisation bugs in it. Why is something like this in the base-image?
To come back to the original post, we never had problems with the sockets of the http-connections. We are using Seaside with Comanche 5.1 behind Apache as https-proxy. Apache is also used to serve static content, like html pages, style-sheets and images. We are using this combination of Comanche 5.1 and Apache also in other projects and even on different platforms, like Linux, Windows and Mac. The speed had never been a problem, so we didn't felt the need to have something like mod_scgi.
Cheers, Lukas
Last month Todd Blanchard posted his Seaside implementation of the Wafer weblog. The Wafer weblog spec was created to compare different Java frameworks for Web applications. The idea is to implement the Wafer weblog using the different frameworks and compare the results. I downloaded Blanchard's Seaside implementation and the Java Struct version. I have no idea how the Java Struts version compares to the other Java versions. It could be I picked a really bad Java example to look at. Here are some simple comparisons.
First the total size of what one downloads for each version. Yes the Java Struts download is ~160 times larger. Size (uncompressed) Java Struts Version 4.1MB (yes MB) Seaside version 26,759 B (yes bytes)
# of Files Java Struts Version 155 Seaside version 1
The struts version had a number of jar files that were needed to run the example. The jar files are rather large, and most were not actual created for the project. They were results of other projects. It also had a number of XML files used to compile the example and configure struts. So I looked at the source files in each version for an idea of how much code one needed to implement each project. In the Java Struts version there were Java classes and jsp files. Here is the total size of the source files. The java struts version is 8.9 times larger.
Source Size Java Struts Version 239,330 B Seaside version 26,759 B
Most of the Java struct version files contain a copyright notice. When I remove the copyright we get that the Java struts version is 5.9 times larger.
Source w/o copyright notice Java Struts Version 159,554 B Seaside version 26,759 B
Finally there is the number of classes.
Classes Java Struts Version 36 Seaside version 12
I have no idea how hard it would be to actually run the Java Struct version. I would have to at least download and install Java Struts and what ever it needs to run. I did not try to look at the Java code to understand it. Nor do I have any idea what Struts are, but have run across a number of references to them.
Todd has a smaller version, which could change the comparison a bit.
URLs for downloading the code are below.
Blanchard's Seaside implementation (http://lists.squeakfoundation.org/pipermail/seaside/attachments/ 20030808/c411a24b/Wafer.obj)
Java Struct version (http://www.waferproject.org/weblog-prototype/index.jsp) ---- Roger Whitney Department of Computer Science whitney@cs.sdsu.edu San Diego State University http://www.eli.sdsu.edu/ San Diego, CA 92182-7720 (619) 583-1978 (619) 594-3535 (office) (619) 594-6746 (fax)
Roger Whitney wrote:
Source w/o copyright notice Java Struts Version 159,554 B Seaside version 26,759 B
Classes Java Struts Version 36 Seaside version 12
We had lessons in university using Struts. Additionally to the fact, that most of the students didn't understand what it is that struts hide from the programmer (the professor said something like "Hey, it's hidden anyway, so you shouldn't need to know what's hidden"), I saw that there is a lot (and very a lot) of code being generated by WASD. Sometimes you have the feeling that the whole construction is some kind of domino game; tip a stone and watch the rest falling (alter a code line and watch the notices, errors and tasks come up). However, this feeling was repealed by the necessary configuration steps - create a forward here, create a form class (nothing than a bean, but if you don't know it...) there.
What we should learn by this lessons was to create a dialog-based application. However, what we learned is that a tool that does a lot of things you don't understand and you cannot repeat by yourself fells uncomfortable...
The answer to the software crisis is (according to our studies) is: indirection and making things more complicated (this way, more engineers can work at a complicated problem) :-)
Oh, another idea I had in one of these lessons: http://reauktion.de/cgi-bin/vanilla.r?selector=display&snip=2003-09-03-w...
Regards, Markus
On Sabato, set 27, 2003, at 13:41 Europe/Rome, Markus Fritsche wrote:
Classes Java Struts Version 36 Seaside version 12
We had lessons in university using Struts. Additionally to the fact, that most of the students didn't understand what it is that struts hide from the programmer
I am using Struts 1.0.2 at Work with WSAD and WSphere 4.x Struts do not hide to you very much :( Struts do not implement a strong MVC (like seaside) but a very light one. Struts simplify writing form/servlet and offer internationalisation but...
, I saw that there is a lot (and very a lot) of code being generated by WASD. Sometimes you have the feeling that the whole construction is some kind of domino game[...] this feeling was repealed by the necessary configuration steps - create a forward here, create a form class (nothing than a bean, but if you don't know it...) there.
...you must write a lot of code to get a single form up and running. And XML config is terrible :) The true problem is another in my own opinion: you study a lot of struts classes, and you get no so much. Efforts are huge, problems solved are not-so-much. Seaside is smaller but: Seaside do not offer property configuration or internationalization support. Auto-generated link can be a problem for some apps. I NEVER can bookmark them!!
Can Squeak/Seside/Comanche scale as the more complex Java Servlet spec.?
But I prefer (a lot) Seaside!!!!!
The actual final version is on my iDisk at http://homepage.mac.com/tblanchard
I expect by now its been copied to the wafer site as well.
The download is quite a bit larger because its a whole image and source file - given that it may be downloaded by non-smalltalkers I wanted to ship something ready to run out of the box. This one uses glorp to pgsql database you have to set up.
I do recall the folks at the users group saying that they typically spent about 5 to 10 times as long to do wafer in the java based frameworks (1 day vs 1- 2 weeks).
-Todd Blanchard
On Friday, September 26, 2003, at 11:44 PM, Roger Whitney wrote:
Last month Todd Blanchard posted his Seaside implementation of the Wafer weblog. The Wafer weblog spec was created to compare different Java frameworks for Web applications. The idea is to implement the Wafer weblog using the different frameworks and compare the results. I downloaded Blanchard's Seaside implementation and the Java Struct version. I have no idea how the Java Struts version compares to the other Java versions. It could be I picked a really bad Java example to look at. Here are some simple comparisons.
First the total size of what one downloads for each version. Yes the Java Struts download is ~160 times larger. Size (uncompressed) Java Struts Version 4.1MB (yes MB) Seaside version 26,759 B (yes bytes)
# of Files Java Struts Version 155 Seaside version 1
The struts version had a number of jar files that were needed to run the example. The jar files are rather large, and most were not actual created for the project. They were results of other projects. It also had a number of XML files used to compile the example and configure struts. So I looked at the source files in each version for an idea of how much code one needed to implement each project. In the Java Struts version there were Java classes and jsp files. Here is the total size of the source files. The java struts version is 8.9 times larger.
Source Size Java Struts Version 239,330 B Seaside version 26,759 B
Most of the Java struct version files contain a copyright notice. When I remove the copyright we get that the Java struts version is 5.9 times larger.
Source w/o copyright notice Java Struts Version 159,554 B Seaside version 26,759 B
Finally there is the number of classes.
Classes Java Struts Version 36 Seaside version 12
I have no idea how hard it would be to actually run the Java Struct version. I would have to at least download and install Java Struts and what ever it needs to run. I did not try to look at the Java code to understand it. Nor do I have any idea what Struts are, but have run across a number of references to them.
Todd has a smaller version, which could change the comparison a bit.
URLs for downloading the code are below.
Blanchard's Seaside implementation (http://lists.squeakfoundation.org/pipermail/seaside/attachments/ 20030808/c411a24b/Wafer.obj)
Java Struct version (http://www.waferproject.org/weblog-prototype/index.jsp)
Roger Whitney Department of Computer Science whitney@cs.sdsu.edu San Diego State University http://www.eli.sdsu.edu/ San Diego, CA 92182-7720 (619) 583-1978 (619) 594-3535 (office) (619) 594-6746 (fax)
Seaside mailing list Seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/listinfo/seaside
On Fri, 26 Sep 2003, Jimmie Houchin wrote:
I think Apache, mod_scgi, and Squeak could outperform Apache, mod_scgi, and Python. I don't know how this compares to mod_lisp. But mod_lisp also isn't available for Apache2 to my knowledge.
Thanks for pointing SCGI out, I hadn't run into it before.
mod_scgi looks very similar to mod_lisp. To be honest I've never found much performance difference between this kind of module (which translate HTTP requests into a custom protocol which they use to communicate with the app server) and mod_proxy, which just forwards the HTTP directly. It's not like parsing HTTP is inherently slower than parsing netstrings. However, it would be worth doing some benchmarking with this module in particular if you've had good experiences with it.
It would probably be quite trivial to implement mod_scgi support for Seaside if someone felt it was worthwhile (an afternoon's work at most). That someone probably isn't going to be me, any time soon, however, unless a client specifically asks for it.
Cheers, Avi
Hello Avi,
Avi Bryant wrote:
On Fri, 26 Sep 2003, Jimmie Houchin wrote:
I think Apache, mod_scgi, and Squeak could outperform Apache, mod_scgi, and Python. I don't know how this compares to mod_lisp. But mod_lisp also isn't available for Apache2 to my knowledge.
Thanks for pointing SCGI out, I hadn't run into it before.
Your welcome. Maybe it'll come in handy some day. :)
mod_scgi looks very similar to mod_lisp. To be honest I've never found much performance difference between this kind of module (which translate HTTP requests into a custom protocol which they use to communicate with the app server) and mod_proxy, which just forwards the HTTP directly. It's not like parsing HTTP is inherently slower than parsing netstrings. However, it would be worth doing some benchmarking with this module in particular if you've had good experiences with it.
I don't know and your probably right. I don't know how any of these protocols work. I don't know anything about mod_proxy. I definitely don't know how it would boost performance of Squeak.
Does it just hand off the request to Squeak, still requiring Squeak to do all the heavy lifting?
It would probably be quite trivial to implement mod_scgi support for Seaside if someone felt it was worthwhile (an afternoon's work at most). That someone probably isn't going to be me, any time soon, however, unless a client specifically asks for it.
Understood.
Thanks again.
Jimmie Houchin
On Fri, 26 Sep 2003, Jimmie Houchin wrote:
I don't know and your probably right. I don't know how any of these protocols work. I don't know anything about mod_proxy. I definitely don't know how it would boost performance of Squeak.
Does it just hand off the request to Squeak, still requiring Squeak to do all the heavy lifting?
All of these protocols do the same thing:
- apache accepts the request - apache connects through a local socket to the application server (squeak) - apache sends the request, in some format, over this socket - the application produces an HTTP response and sends it back - apache forwards this response back to the user
The only difference is what format the request is in when it is sent over the local socket.
With mod_lisp, the request looks something like this:
URL foo Content-Length 123 Authorization asdasasd ....
With mod_scgi, it looks something like this:
300:URL foo Content-Length 123 Authorization asdasdasd....
With mod_proxy, it looks something like this:
GET foo HTTP/1.1 Content-Length: 123 Authorization: asdasdasdasd ...
I'm not kidding. That's pretty much the difference.
Now, the last format (that mod_proxy uses) is straight HTTP, which means that Comanche will understand it. The others need a different kind of server which understands the mod_lisp format or the mod_scgi format instead of the HTTP format. There's no particular reason to believe that these servers will be any faster than Comanche, although the particular implementations may be. I don't think request parsing is much of a bottleneck anyway, it seems a funny thing to optimize.
*All* of these will be slower than connecting to Comanche directly. The point of putting Squeak behind apache is not to somehow leverage apache's performance, but to integrate better with other apache features - like, for example, serving static content.
If the dynamic parts of your site don't perform adequately using Comanche, you have a problem that introducing apache cannot possibly fix (except as a front end for some kind of load balancing system, but that could just as easily be done with, say, the Pen load balancer).
Avi
Avi Bryant wrote:
On Fri, 26 Sep 2003, Jimmie Houchin wrote:
[snip]
All of these protocols do the same thing:
- apache accepts the request
- apache connects through a local socket to the application server (squeak)
- apache sends the request, in some format, over this socket
- the application produces an HTTP response and sends it back
- apache forwards this response back to the user
The only difference is what format the request is in when it is sent over the local socket.
[snip]
I'm not kidding. That's pretty much the difference.
Okay.
Now, the last format (that mod_proxy uses) is straight HTTP, which means that Comanche will understand it. The others need a different kind of server which understands the mod_lisp format or the mod_scgi format instead of the HTTP format. There's no particular reason to believe that these servers will be any faster than Comanche, although the particular implementations may be. I don't think request parsing is much of a bottleneck anyway, it seems a funny thing to optimize.
*All* of these will be slower than connecting to Comanche directly. The point of putting Squeak behind apache is not to somehow leverage apache's performance, but to integrate better with other apache features - like, for example, serving static content.
This is the part I don't understand. I am not expert on any of this. All I know is that:
Apache helloworld25k.html = 800 rps Comanche helloworld25k.html = 90 rps Apache, mod_python helloworld25k.py = 390 rps (script opening the 25k file and serving it to Apache) Medusa (python web server)helloworld25k.html = 25 rps This is all from memory and not necessarily totally accurate.
I don't intend for Apache to serve any static files. I'll use Tux for static files and images. My only desire for Apache is to improve dynamic requests. If Squeak/Comanche could equal or better Apache/mod_python in performance on dynamic pages. I would leave Apache alone.
If I am misunderstanding my experience or doing something wrong I don't know. I am open too that. I don't think Apache mod_python are doing any caching. The script opens, reads and closes the file. Python is persistent, long-lived, but the script should still execute the open, read, close every time. Just a modest attempt at a simple dynamic response.
I will attempt to try mod_proxy this weekend and put Apache2, mod_proxy, Squeak to a test.
I just don't see any reason for Apache, mod_python, etc. to outperform Squeak. To my thinking I believe Squeak should outperform most Python solutions.
If the dynamic parts of your site don't perform adequately using Comanche, you have a problem that introducing apache cannot possibly fix (except as a front end for some kind of load balancing system, but that could just as easily be done with, say, the Pen load balancer).
Thanks for the education. I'm really rooting for Squeak. It is truly what I would prefer, but it has to earn its keep. :)
Jimmie
On Friday, September 26, 2003, at 01:23 PM, Jimmie Houchin wrote:
Avi Bryant wrote:
Now, the last format (that mod_proxy uses) is straight HTTP, which means that Comanche will understand it. The others need a different kind of server which understands the mod_lisp format or the mod_scgi format instead of the HTTP format. There's no particular reason to believe that these servers will be any faster than Comanche, although the particular implementations may be. I don't think request parsing is much of a bottleneck anyway, it seems a funny thing to optimize. *All* of these will be slower than connecting to Comanche directly. The point of putting Squeak behind apache is not to somehow leverage apache's performance, but to integrate better with other apache features - like, for example, serving static content.
This is the part I don't understand. I am not expert on any of this. All I know is that:
Apache helloworld25k.html = 800 rps Comanche helloworld25k.html = 90 rps Apache, mod_python helloworld25k.py = 390 rps (script opening the 25k file and serving it to Apache) Medusa (python web server)helloworld25k.html = 25 rps This is all from memory and not necessarily totally accurate.
Ah, but these are all tests of performance at serving static pages. Of course Apache will be king here, it's straight C-code, highly tuned for doing exactly that. As soon as you introduce dynamic content, it's whole different ball of wax.
I don't intend for Apache to serve any static files. I'll use Tux for static files and images. My only desire for Apache is to improve dynamic requests. If Squeak/Comanche could equal or better Apache/mod_python in performance on dynamic pages. I would leave Apache alone.
In this case, as Avi mentioned, putting Apache in front of Squeak will only slow the whole process down. Assuming, of course, that you only have one Squeak server. If you've got a cluster of Squeak servers and you're using Apache for load balancing, you could probably get good performance under high load, although the time it takes to process a given request wouldn't improve.
If I am misunderstanding my experience or doing something wrong I don't know. I am open too that. I don't think Apache mod_python are doing any caching. The script opens, reads and closes the file. Python is persistent, long-lived, but the script should still execute the open, read, close every time. Just a modest attempt at a simple dynamic response.
I think the reason your mod_python test is so much faster than Comanche or Medusa is because it spends most of it's time in C code. I'll bet the python code to open and output a file is only a couple of lines, am I right?
The difference between this and what you'll get with Quixote probably quite significant. With Quixote, you'll incur the overhead of interprocess communication between Apache and your python process. With mod_python, you don't get this overhead, because python is running inside your Apache process.
I would expect Apache + mod_scgi + python server to be slower than straight Commanche.
I suspect that you're attempting to optimize prematurely. Ultimately the performance of your site will probably depend more on how you generate your dynamic content than on what language or HTTP server you use. If your database has 10s of millions of records, you're app will probably spend more time waiting for data than parsing requests.
My best advice would be to worry more about writing your app than tuning it for right now. Once you get it up and running you can measure its performance, profile it, and find the most effective ways to get the performance you need. And that's the goal, right? "As fast as necessary," not "as fast as possible." It may be that you can get away with straight Comanche to start, and then ramp up performance as your user base grows.
If it were me, I'd write the app in Seaside/Squeak, knowing that to dramatically increase performance, I have many options: - create a cluster of Squeak servers, and put a load balancer in front of it - write a VM plugin to optimize critical sections of code - port to VisualWorks and get a 10x speed increase from JIT compilation
Colin
Hello Colin, all.
I want to make it clear that I am very pro Squeak. But I need to make the best decision for me, for this time, for this situation. The users here may make a different but equally valid decision for themselves building the same website. I can only go with my knowledge and understanding and skills.
Below are the facts as best I understand them. Nothing is meant to be or sound like an advertisement or endorsement of Python over Squeak. I may be doing something wrong.
Colin Putney wrote:
On Friday, September 26, 2003, at 01:23 PM, Jimmie Houchin wrote:
Avi Bryant wrote:
[snip]
This is the part I don't understand. I am not expert on any of this. All I know is that:
Apache helloworld25k.html = 800 rps Comanche helloworld25k.html = 90 rps Apache, mod_python helloworld25k.py = 390 rps (script opening the 25k file and serving it to Apache) Medusa (python web server)helloworld25k.html = 25 rps This is all from memory and not necessarily totally accurate.
Okay, I'm back at home. Time for real numbers, not faulty memory. :)
Machine: Gentoo Linux, Apache 2.0.47, mod_python 3.1, Squeak 3.6g2, squeak-5423.image, Kom 6.2 hot off the press. :) The VM and image were fresh downloads and compiles expressly for this test.
test: httperf --hog --num-conn 10000 --server ?? --port ?? --uri=/?? ?? = correct values.
Apache hello25k.html 550 rps totalCPU 27% Komanche hello25k.html 8.5 rps totalCPU 53% A2-mod_python args.py 155 rps totalCPU 12% A2-mod_python mptest.py 348 rps totalCPU 14% Komanche dynamic***1 8.3 rps totalCPU 48% Komanche dynamic***2 29 rps totalCPU 62%
Interesting. mod_python used less CPU than Apache serving it statically. And the static file was 5k less than the mod_python test which was 30k.
mptest.py merely serves a 'Hello World!!!!!!!' string. args.py opens,reads,closes 11 files, writes the data plus reads/writes environment variables to the request for a total of 30k.
The Apache, mod_python setup is no more optimized than the Squeak setup. All of them are default installs.
According to the mod_python page the author gets 1200 rps from ab on a 1200 mhz Pentium serving a 'hello' string dynamically with a certain mod_python handler. So optimizing is definitely available.
Ah, but these are all tests of performance at serving static pages. Of course Apache will be king here, it's straight C-code, highly tuned for doing exactly that. As soon as you introduce dynamic content, it's whole different ball of wax.
Yes.
I don't intend for Apache to serve any static files. I'll use Tux for static files and images. My only desire for Apache is to improve dynamic requests. If Squeak/Comanche could equal or better Apache/mod_python in performance on dynamic pages. I would leave Apache alone.
In this case, as Avi mentioned, putting Apache in front of Squeak will only slow the whole process down. Assuming, of course, that you only have one Squeak server. If you've got a cluster of Squeak servers and you're using Apache for load balancing, you could probably get good performance under high load, although the time it takes to process a given request wouldn't improve.
Okay.
If I am misunderstanding my experience or doing something wrong I don't know. I am open too that. I don't think Apache mod_python are doing any caching. The script opens, reads and closes the file. Python is persistent, long-lived, but the script should still execute the open, read, close every time. Just a modest attempt at a simple dynamic response.
I think the reason your mod_python test is so much faster than Comanche or Medusa is because it spends most of it's time in C code. I'll bet the python code to open and output a file is only a couple of lines, am I right?
That is correct. file = open(/path/to/file) req.write(file.read())
Nevertheless it still has to open and read it into the vm before writing it out to the request.
The Squeak test above was with an internal variable already available inside of Squeak. No opening, no reading, just spit it out.
The difference between this and what you'll get with Quixote probably quite significant. With Quixote, you'll incur the overhead of interprocess communication between Apache and your python process. With mod_python, you don't get this overhead, because python is running inside your Apache process.
Actually Jon's experience with LWN (Linux Weekly News) was improved performance when switching from mod_python to mod_scgi. http://www.lwn.net
I would expect Apache + mod_scgi + python server to be slower than straight Commanche.
I would to but I'm not experiencing it. :(
I suspect that you're attempting to optimize prematurely. Ultimately the performance of your site will probably depend more on how you generate your dynamic content than on what language or HTTP server you use. If your database has 10s of millions of records, you're app will probably spend more time waiting for data than parsing requests.
Actually I'm not really attempting to optimize at all. I'm merely attempting to gather some preliminary data to make a decision.
If Squeak doing the simplest request is magnitudes slower than mod_python doing a much more complex request, then it makes it that much harder for me to choose Squeak. This is not to say that Squeak isn't an excellent choice for other apps/webapps.
You are probably correct that querying the db will be the more intensive process. Nevertheless, with the results above, mod_python takes less time and uses less resources while taking the less time which leaves more time and more resources to do the other tasks. Whew, I hope that came out clear. :)
My best advice would be to worry more about writing your app than tuning it for right now. Once you get it up and running you can measure its performance, profile it, and find the most effective ways to get the performance you need. And that's the goal, right? "As fast as necessary," not "as fast as possible." It may be that you can get away with straight Comanche to start, and then ramp up performance as your user base grows.
Currently I actually understand Apache, mod_****, Quixote better than writing Comanche modules. My Python is as good or better than my Smalltalk. I just happen to like Smalltalk better.
So currently for me, by me Apache, mod_****, Quixote is faster up and faster running.
I'm hoping for a fast start. If I understand the market I'm entering like I think I do... And if I have sufficient value over my competition, like I think I will... I won't have much time to ramp up. It needs to be ready. Only deployment will tell the tale between my hopes, dreams, ambitions and reality. :)
If it were me, I'd write the app in Seaside/Squeak, knowing that to dramatically increase performance, I have many options: - create a cluster of Squeak servers, and put a load balancer in front of it - write a VM plugin to optimize critical sections of code - port to VisualWorks and get a 10x speed increase from JIT compilation
VisualWorks is way too expensive. For the $1000s it would cost I would rather put that into Squeak. If I had it $ available, which I don't. But do hope to after the site is up. :)
Jimmie
Squeak code *** These were run in a Workspace. This was copied from the Class documentation and modified for the test.
***1 | ma | ma _ ModuleAssembly core. ma addPlug: [ :request | HttpResponse fromString: JLHKomServer K25]. (HttpService startOn: 8080 named: 'JLHKS') module: ma rootModule. "JLHKomServer K25 is a class variable holding the 25k string"
***2 | ma2 | ma2 _ ModuleAssembly core. ma2 addPlug: [ :request | HttpResponse fromString: 'Hello World!']. (HttpService startOn: 8888 named: 'JLHKS') module: ma2 rootModule.
If I am doing something wrong which is making Squeak look bad, please correct me. It is not my intention.
Apache hello25k.html 550 rps totalCPU 27% Komanche hello25k.html 8.5 rps totalCPU 53% A2-mod_python args.py 155 rps totalCPU 12% A2-mod_python mptest.py 348 rps totalCPU 14% Komanche dynamic***1 8.3 rps totalCPU 48% Komanche dynamic***2 29 rps totalCPU 62%
FWIW: you don't mention what kind of machine you're on. I'm on an Athlon 1.4Ghz, and for a "Hello World" response, I see these kinds of numbers:
Apache (warming up): ~300 rps Apache (with file cached): ~1500 rps Comanche 6: ~300 rps KomServices: ~1500rps
The last is using the KomServices package to build a dummy server that always spits out a "Hello World" HTTP response, no matter what you send it - it reads the request from the socket but doesn't do anything with it. This is meant as the theoretical optimal Squeak server - SCGI etc will have that as an (unrealistic) upper bound.
I'm not going to draw many conclusions here because I think these kind of micro-benchmarks are pretty pointless - they don't tell you *anything* about how a real application will perform. I'll just point out that the difference between 300rps and 1500rps (which coincidentally happen to be the only two numbers that appeared in my timings) is 2.7ms per request. Those 3ms are, essentially, the overhead that Comanche itself imposes (whether over Apache or over some hypothetical super-optimized SCGI server).
So before you end up doing too much more benchmarking, think about how much difference those 3ms will or won't make in the big picture of your application. It's not an insignificant number - it *could* make a difference - but it wouldn't take, say, a very complex database query to overshadow that.
If you're really performance-obsessed, you might also want to do some benchmarks of Squeak and Python for the other 99% of the work your application is doing, when it's not pushing bytes across sockets. It at least used to be that Python had a very, very slow interpreter, and it wouldn't surprise me if Squeak pulled way ahead in a complex dynamic page, even given mod_python's apparent 3ms headstart.
But personally I would recommend ditching the benchmarks and starting to write code. All you have to do to make things get faster is wait around for Moore's law, but the app won't write itself.
Cheers, Avi
Jimmie Houchin seaside@lists.squeakfoundation.org said:
I am currently building a website. I would like to use Squeak but am concerned about performance and stability.
Why, you'll be getting more than ~50 hits per second? Do you have any numbers to share? Because it is hard to talk about performance requirements if you don't have any data to base discussions on.
Apache and Python, Quixote to be specific. http://www.mems-exchange.org/software/quixote/
I looked at Quixote a couple of years ago, and I liked it. Not as much as Smalltalk, though ;-)
Much of the performance of Quixote is from using Apache as the webserver instead of a Python based server. The most popular means of achieving that is via mod_scgi.
That is because the alternative, starting the Python interpreter over and over again through a CGI script, sucks performance-wise (that's why I'm doing simple Python CGI on all my wiki's and other interactive pages on my homepage, www.cdegroot.com, which gets around 200,000 hits per month (and every page hit renders a dynamic bit through starting Python process). Remember it is *relative* performance we're discussing here, a multi-gigaherz box takes away a lot of the pain here.
Anyway, Squeak just runs in all configurations - you don't start it up per hit, but you start up Squeak as application server, and whether the requests between Apache and Squeak are forwarded through mod_proxy, mod_lisp, mod_scgi, mod_fcgi, or mod_my_mod_is_better_than_yours, does not really matter. Maybe at the very high end, where the overhead of socket creation with simple mod_proxy starts to count, this choice becomes important; however, I wouldn't worry too much about it at this stage.
FYI, a benchmark - I'm running www.tric.nl on a dog-slow overloaded PII/400. The site is based on Gardner and Janus, beta-level software that is totally stupid about optimization, because every page is rendered twice for every hit. While me and my wife were concurrently working on the box (viva VNC), and pisg was sucking CPU because it was updating my IRC stats pages, a locally-running 'ab' could still suck 4 pages per second from it.
That's lousy, for sure, and /me should do some profiling, but it does establish a nice bottom line: will you be happy with 4p/s? If yes, please forget about performance :-)
Hth
Cees
seaside@lists.squeakfoundation.org