Scaling Seaside apps (was: [Seaside] About SToR)

Wed Aug 2 13:33:41 UTC 2006

Another datapoint, for those interested...

I hacked up a quick implementation of some parsing code I have in several
other languages, using GLORP.  The process is single-threaded, but forked
off as a background thread that is monitored from a Seaside "control
panel".

There are several, somewhat structured scrapings from web pages that I
want stored on disk.  This data should be approximately 1GB when the
process finishes.  I wrote a lightweight proxy for GLORP that makes
session access atomic, and everything works like a charm.

What I was amazed to find was that the Squeak image, with one process
running mind you, is CPU limited!  (I have tried a variety of different
priorities for the forked process, including the IO priorities.)  It's
been difficult for me to figure out exactly how to count the number of
message sends (looking in the Seaside profiler, I know it's quite
possible), however, looking at the Process panel seems to point the finger
at GLORP, constructing a ton of queries on-the-fly.  Opening the task
manager and watching bandwidth consumption agrees...  Brief periods of
activity followed by pauses as my program tries to figure out what to do
with the data it pulled.  The running Postgres image, too, is sitting
there with 5% CPU usage, not breaking a sweat.

GLORP is a dream to work with.  It almost makes those spurious
object-access patterns look free.  :-)  But, if you don't want to store a
whole table in memory and you don't want to go twiddling down the whole
B-tree every time you do an object access, you want a cursor, and I
haven't quite figured out how to get that to work...

On a side note, I achieved 10-12x the throughput with my prototype program
(written in a different language and dumping the serialized representation
to disk), and I have moved on to yet another language to finish the job. 
*Sigh*  One day I'll be able to use Squeak.

Jeremy

> Very interesting.  It looks like something that Squeak itself could
> benefit from, wrapped in a Stream or Flow interface.
>
> Whether an I/O intensive application like a DB server could benefit
> from that is hard to say, those servers typically want to have close
> (read: quick) access to the db files, I'm pretty sure there would be
> performance challenges with remote primitive access.
>
> It might be good for backups though..
>
> --- Darius Clarke <socinian at gmail.com> wrote:
>
>> Could, should Magma also use Amazon S3
>> http://www.amazon.com/s3
>> as a  storage device?
>>
>> I've not thought through what it would take to optimize for it, but
>> it
>> might reduce a lot of data/code/image persistency headaches.
>>
>> Cheers,
>> Darius
>>
>>
>
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>

GPG PUBLIC KEY: 0xA2B36CE5