[Seaside] Re: Scaling Seaside apps (was: About SToR)

Thu Aug 10 14:41:29 UTC 2006

On 8/5/06, Alan Knight <knight at acm.org> wrote:
>
> OK, I've joined this list now. I'll respond to a couple of comments in
> various messages.

Cool!  :-)

> ---------------------
> In a different part of the thread, Jeremy Shute shutej at crazilocks.comwrote:
>
>
> But, looking at the process browser, the messages in the GLORP-operating
> thread seem to relate to the assembly of queries.  The cursor would
> eliminate the READ queries, but the WRITE queries are still necessary,
> requiring assembly one-at-a-time, unless GLORP is smart enough to use a
> global prepared statement underneath for insertions.  I see no reason why
> it
> shouldn't be -- insertions are very homogenous operations, unlike WHERE
> clauses.  Perhaps a cursor WOULD solve the problem?
> -----------------
>
> Glorp does group insertions together. This can give some very large
> performance improvements, e.g. if it can use Oracle array binding over a
> high-latency connection. However, for Postgresql there isn't any directly
> analogous mechanism. The mechanisms I saw available were a) just
> concatenating the inserts together in a single large statement with
> semicolons separating them. This helps with network traffic, but doesn't
> really tell the database it's doing many similar inserts. b) The bulk load
> mechanism. This seemed very much aimed at loading data from files, and
> looked like it might be possible to use more directly, but quite awkward. I
> haven't done this.

Yes, neither option seems to be very fun.  I wonder if there's a way to hack
up the internal Postgres mechanisms to talk more directly to Squeak, like an
INSERT INTO with a subselect clause, and provide a veneer where Squeak
offers the contents of the subselect over the wire?  It would be a
monstrous, nasty, warty thing, but at least it could take advantage of the
homogeneous nature of the data.  I know some people have written code using
remote tables, if you could trick Postgres into thinking the running Squeak
image was a remote table WITHOUT hacking a plugin, you could probably get
more expedient bulk inserts...

So right now on Postgresql (and others) Glorp will just concatenate the
> inserts together (up to a limit). One thing to watch out for from a
> performance perspective is to turn binding off for Postgresql. At least in
> the VW driver, binding was implemented by taking the queries, splitting them
> up into strings, and then concatenating the string representation of the
> bound values into the query. If you have a query with a very long string (
> e.g. because it's a large number of inserts being done all at once) then
> this became a huge performance bottleneck, and doesn't even seem to offer
> any of the benefits of binding in terms of not needing to convert back and
> forth between string representations.

Are you saying turning binding off will help my GLORP performance?

I'm a little dense - I read this and I think you're saying "binding" is
tokenizing the query, hunting for '?', and performing string concatenation
if you're issuing a prepared statement.  Also, does the driver do piecemeal
',' concatenation, or StringBuffer-like appends?

Glorp will also keep a cache of prepared statements and re-use them if the
> hooks are in place, but I wasn't aware of how to do that for Postgresql.

This sentence is what's confusing me about the above...  It makes it sound
like GLORP does not use prepared statements on Postgres, in which case, my
understanding of turning off "binding" must be askew...  If you're not
issuing prepared statements, why would turning off "binding" help?

I suspect it's less important in such a situation, in that you'd have to
> have binding done better in order to effectively reuse prepared statements.
> I also have a feeling, although I haven't measured, that Postgresql is more
> tolerant of dynamic SQL for performance than, say, Oracle, which really
> really wants you to use prepared statements.

Still, I think of the prepared statement as a way to construct a partial
abstract syntax tree on the server side, once, and plug in missing
sections.  In turn, this means you ship a handle to this server-side object
instead of a much larger stream of characters (or boxed symbols and strings,
I don't know how the driver works), which has to get parsed into an abstract
syntax tree at the server for every inserted row.  Even on localhost over a
pipe, a prepared statement must be preferable for any RDBMS, right?  (As a
side note, do you know WHY the difference on Oracle so dramatic -- are they
doing less optimization in the driver than Postgres and therefore shipping
more data?)

Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/seaside/attachments/20060810/f9e67620/attachment.htm