Next steps

Fri Jan 13 06:20:54 UTC 2006

Hi Chris!

First - thanks for taking time to answer. :)

Chris Muller <chris at funkyobjects.org> wrote:
> Hey Göran, I don't have the context you have into your
> domain, nor experience with Seaside.  Nevertheless, my
> strong intution suggests we should step back and
> consider again having one Magma session per Seaside
> session.

Ok, well, I can probably do that - I just need to be sure that I feel I
have "ways out" if it turns bad. Call it "precautionary investigations".
Since I am putting myself (and Magma/Seaside/Squeak) on the line here I
don't want to fail.

> I am not sure whether you are trying to optimize for
> speed or memory consumption, but I think that this 1:1
> approach is good for both.

Not optimizing at the moment - mainly "dabbling" in my head. But both
concerns are valid, even though memory consumption was my main worry.

> > > Still, it is probably good to try to keep the
> > readSet
> > > as small as possible.
> > 
> > Well, I find this recommendation slightly odd *in
> > general*. I understand
> > how it makes each transaction faster - but on the
> > other hand you loose
> > the caching benefit. For example, in this app I want
> > a significant part
> > of the model to be cached at all times - the meta
> > model. It will not be
> > large (so I can afford to cache it, even in several
> > sessions), but it
> > will be heavily used so I don't want to end up
> > reading it over and over.
> 
> It's ok.  Go ahead and cache your meta-model in each
> session if its not so big, but seriously let
> everything else be read dynamically as-needed.  Let
> every session have only a very small portion of the
> domain cached and keep it small via #stubOut.  
> 
> Reads (proxy materializations) are one of the fastest
> things Magma does.

Ok, I assume I might still be avoiding actual file access - given OS
file level caching.

> You are supposed to *enjoy* the
> transparency, not have to worry about such complex
> ways to circumvent it.

I am enjoying it! You may recall I am an old GemStone dog - I know how
to enjoy that. :)

> ReadStrategies and #stubOut: are intended to optimize
> read-performance and memory consumption, respectively.

I understand them - the first is similar to GemStone, the second is not
- since it is automatic in GemStone, but whatever.

> If these are not sufficient, and assuming the
> uni-session approach (all Seaside sessions share one
> MagmaSession and one copy of the domain) is not
> either, *then* these other complex alternatives should
> be considered.  It's not easy for me to say but I have
> to face the truth; if the intended transparency of
> Magma cannot be enjoyed then that opens up lots of
> other options that are equally less-transparent.

Ok. One huge benefit with using 1-1 instead of Cees' ConnectionPool is
that my Seaside components can hold onto the persistent objects.
Otherwise they can't, because the next request will end up using a
different session.

And I really wonder why I haven't realized that until now. ;) Sigh.

> > As a reminder - the reason for my discussion on this
> > topic is that I
> > feel that the "simplistic approach" of simply using
> > a single
> > MagmaSession for each Seaside session doesn't scale
> > that well. I am
> > looking at a possible 100 concurrent users (in the
> > end, not from the
> > start) using an object model with at least say 50000
> > cases - which of
> > course each consists of a number of other objects.
> > Sure, I can use some
> > kind of multi-image clustering with round-robin
> > Apache in front etc, but
> > still.
> 
> Well, it may scale better than you think.  Peak
> (single-object) read rate is 3149 per second on my
> slow laptop,

Are we talking cold cache including actual file access? And how does the
size of the files on disk affect that?

> 7.15 per second (see
> http://minnow.cc.gatech.edu/squeak/5606 or run your
> own MagmaBenchmarker) to read one thousand objects. 

Not sure I grokked that sentence. :)

> So if you have 1000 objects in a Case, 100 users all
> requesting a case at exactly the same time then the
> longest delay would be ~10 seconds (assuming you're
> not serving with my slow, circa 2004 laptop). 

Mmm.

> Optimizing the ReadStrategy for a Case would allow
> better performance.

That I probably will do when the app settles.

> Any single-image Seaside server where you want to
> cache a whole bunch of stuff is going to have this
> sort of scalability issue, no matter what DB is used,
> right?  Remember, you could use the many:1 approach
> (all Seaside sessions sharing one Magma session and
> single-copy of the domain), how does this differ from
> any other solution?.

Eh... not sure I follow the logic, but never mind. :)

> The 1:1 design, OTOH, is what makes multi-image
> clustering possible, so from that aspect risk is
> reduced.  That's the one I would try very hard to make
> work before abandoning TSTTCPW.

Good point.

> > As a sidenote, GemStone has a "shared page cache" so
> > that multiple
> > sessions actually share a cache of objects in ram.
> 
> That's in the server-side GemStone-Smalltalk image
> memory though, isn't it?  Magma doesn't do that.

The "server side" GemStone image can run anywhere - so the closest
counterpart in Magma is actually the client image IMHO.

> > Could we possibly
> > contemplate some way of having sessions share a
> > cache? Yes, complex
> > stuff I know. Btw, could you perhaps explain how the
> > caching works
> > today? Do you have some kind of low level cache on
> > the file level for
> > example?
> 
> I'm open to ideas.  The caching is very simple right
> now, it just uses WeakIdentityDictionarys to hold read
> objects.

And one per session I assume? No cache on any lower level, like on top
of the file code?

> > > A commit is pretty cheap with a small readSet. 
> > With a
> > > large readSet, WriteBarrier will definitely
> > improve it
> > > dramatically.
> > 
> > I kinda guessed. Otherwise you keep an original
> > duplicate of all cached
> > objects, right? So WriteBarrier also improves on
> > memory consumption I
> > guess.
> 
> No to the first question, yes to the second (IIRC). 
> It doesn't keep an original "duplicate", just the
> original buffer that was read.

Ah, ok. But you don't need that when using WriteBarrier right?

> > Very good. :) And also - do you have any clue on how
> > the performance is
> > affected by using the various security parts?
> 
> Authorizing every request seems to have imposed about
> a 10% penalty.  #cryptInFiles is hard to measure since
> writes occur in the background anyway. 
> #cryptOnNetwork definitely slows down network
> transmissions considerably, only use it if you have
> to.
> 
> Regards,
>   Chris

regards, Göran