Read strategies (was MagmaCollection API)

Fri Sep 15 01:56:01 UTC 2006

Thanks for the great questions Göran!

> Thing is I probably need "in memory" speed when viewing and filtering
> the case collection. Some of my filtering can be done using indexing,
> but some probably will need "iteration". So I will most probably end
> up
> materializing all cases on a pretty regular basis. Doing that for 100
> cases is probably no big deal, but 15000? Nope. Also - what I have
> seen
> so far it takes a bit too long to materialize 20 cases in order to
> show
> them in a table. Sure, a better readstrategy that only fetches
> exactly
> what I need would help - but it is hard for me to predict what
> columns
> (attributes) will be shown and how much state I need to fetch in
> order
> to satisfy that - the columns are not hard coded.

Don't worry about trying to get "exactly" what you need.  Start by
being liberal with the ReadStrategy (i.e., read to depth 9999 on as
many instVars of the Case as you are reasonably sure will be needed)
and then back it off if necessary.  You might be surprised (I hope!) at
how fast it retrieves the case with the ReadStrategy in place.  To put
it into perspective, I have witnessed 10X improvement when faulting
down a hundred Transactions simply by putting a ReadStrategy on their
'date' variable to read 9999 levels deep (since then, I've put that
into Magma's base code, 9999 levels deep on any Date).

> So I simply need for Gjallar to "cache" the cases in RAM. 

Magma was designed (with the hope that) any type of end-user GUI
application, where the user searches for some chunky object (i.e., a
Case), opens it up (faulted all at once, via ReadStrategy) and then
commits some changes to it, can be accomplished with reasonable
performance via the 1:1 session design.  Each of these steps; search,
open, commit change; should take no more than a second or two each..

And then, when the user "closes" that case the program should keep the
readSet small by stubbing it out.

Now, if the nature of the model is large enough and with enough
activity  by lots of users to where it becomes too much, then the 1:1
can scale by spreading multiple sessions across multiple OS threads
(images), cpu's, or computers.  But the code remains simple and
unchanged.

I probably sound like a broken record, sorry..

> I can't do
> that for each and every user - so a shared session is the answer. The
> "problem" with this approach is to know when I can abort it to get it
> refreshed since the objects read using that session will be accessed
> (readonly) by multiple Squeak Processes. I think I will just "ignore"
> the issue for now and consider the model "thread safe" when used
> readonly. Of course, Magma probably puts the objects in theoretically
> unsound states for short periods of time when doing the refresh - but
> whatever. :) If this turns out to be a problem I will just have to
> protect the model with a Monitor while doing the abort.

Yea, this makes me nervous; I think it could experience intermittent
problems due to the two-step materialization process (remember the
Dictionary full of Integers?).

A large cache in Ram has the detriment of a large readSet, which works
against performance; maybe not too much for read-only but some because
the underlying dictionary's are large.

> Also, I have been toying a bit with readstrategies - only a tiny bit

Good deal, I hope you find they *really* help out!

> but I actually have a few questions regarding those:
> 
> 0. You need to update the page
> http://minnow.cc.gatech.edu/squeak/2638
> so that it says MaReadStrategy. :)

Ok thanks, I fixed that.  Please feel free to update if you see typos.

> 1. Am I meant to create a MaReadStrategy (say "MaReadStrategy
> minimumDepth: 1"), configure it using
> #forVariableNamed:onAny:readToDepth: (several of those messages of
> course) and then put it in the session? If so - for how long is this
> strategy valid? For the life of the session? Or just for the next
> query?

For the life of the session or until you replace it by putting another
one in the session.  Setting your sessioons #readStrategy: completely
replaces the previous one.

> 2. Let's say I use one strategy to read "just enough" of the Q2Case
> instances when making queries over the collection. Then when I need
> to
> view a *selected* case - I typically need to get "most of the rest"
> of
> the case - a new strategy. But... this creates two more questions:
> 
> 	2.1 If the case is already materialized - how can I control the
> reading
> of the rest - can I somehow make sure it reads "the rest" in a single
> materialization? Like, change the strategy in the session and then
> perhaps run "session rematerialize: myCase" and then it would fetch
> the
> stuff that is still missing using the new strategy?

You are talking about maintaining two ReadStrategies.  One conservative
for the "searching for a case", another liberal one for "opening a
case".  Your program will have to constantly swap them back and forth.

But that's a great question I never thought of!  Since the conservative
strategy has already materialized the actual case, you'll only hit
proxy's on one of its *sub*-objects when you go to open and view the
rest of it, not the actual case, so the liberal ReadStrategy that says
to ready 99999 on the Case will not be used!

The answer is, for the liberal "opening a case" read-strategy, whatever
kind of object contains "most of the rest of" the case, be sure to
include that in the read-strategy as well with a depth of 99999.  

Don't forget once the case is rendered, put back the conservative
read-strategy (so future searches don't fault down too much).

> 	2.2 If I am running locally (no network roundtrips for each
> materialization) do I still benefit from turning this into a single
> materialization or does it not matter? This I can of course test -
> *if*
> I knew how to do rematerialization using a different strategy.

Yes, absolutely.  It severely cuts down on the number of trips to the
server, even though its local, it still matters a lot.

> In fact - it would be great if I could easily log materializations -
> I
> will try to hook into that method you mentioned earlier.

You may also want to try:

  mySession preferences signalProxyMaterializations: true

and change your code to 

  [ "..do something, open a case.." ]
    on: MagmaProxyMaterialization
    do: 
      [ : noti |
      MyMaterializedObjectBag add: noti materializedObject.
      noti resume "don't forget this!!  :)" ]

Whew, cheers!  :)
 Chris