Indexing

Fri Feb 24 19:06:20 UTC 2006

Hi Chris!

Chris Muller <chris at funkyobjects.org> wrote:
[SNIP]
> So this is another reason to stay with 1.0 for now.  I
> have merged the fixes into my own local 1.1 but not
> planning to commit it to SqueakSource yet until I'm
> done with this iteration.

Ok, yes, I will be sticking to 1.0 until there is a compelling reason to
switch for me - and KryptOn is not AFAICT such a reason in this
particular project.

> > Anyway, I just loaded Magma1.0-cmm.4 and my app
> > still works. :)
> > 
> > (Cees and others are now using Monticello
> > Configurations, perhaps that
> > is an option for you too - a config is just a list
> > of specific
> > snapshots)
> 
> I have no preference either way other than I really
> don't want to have a SqueakSource server running right
> now just to use MC-Configs..  When they support
> File-based repositories I'll check them out again.

Oh, ok. Didn't know that.

> > Ok, so in my app I have MagmaCollections in three
> > different places and
> > given the number of instances of my domain objects
> > at the moment I
> > should have this number of MagmaCollections:
> > 
> > 	(Q2Model allInstances size * 2) + (Q2Process
> > allInstances size) ==> 9
> > 
> > Note: Q2Model has two MagmaCollection instvars and
> > Q2Process one.
> > 
> > This gives me 9 right now. MagmaCollection
> > allInstances size gives me
> > 378! And MagmaCollectionChanges allInstances size
> > gives 379.
> 
> The next time this happens, see how many instances of
> MagmaSession you have.  Remember, they all have their
> own copy of all the MagmaCollections and changes.

Right, I am aware of that.

> There have been intermittent issues with cleanup of
> old sessions over the years, it may be back..  It was
> always related to Block/Method contexts holding old
> Sessions in one of their (temp-var?) references.. 
> There is a utility method, MagmaSession
> class>>#cleanUp which enumerates all instances of
> these contexts does a fine job of getting rid of the
> ones; print-it to see the before/after instance count.

Good advice! I have been battling trying to get rid of MagmaSessions
quite a bit you see.
It has seemed quite odd to me, but I will try that.

> > Hmmmmm, ok - now I cleaned out my Magma db directory
> > (it had tons of
> > older MagmaCollection files - indexes that is -
> > around 370-ish). Now it
> > looks much better. 
> 
> Now this confuses me.  "Cleaning up" the the directory
> files alone should have no effect on the number of
> instances in the image..  ??

No, I actually toasted the whole dir, recreated the db and indexes and
all.
The problem is probably related to the fact that my "fill the db with
stuff" code also creates the indexes (at the same time as I instantiate
the MagmaCollections) so running that code (reinitializing my domain
model) over and over creates more and more index files. And then - when
I close and reopen the db Magma evidently gets a bit confused - that is
my guess.

> > I still have "twice too many"
> > MagmaCollections in my
> > image though:
> > ...
> > That is the expected
> > number given a single
> > MagmaSession. The second MagmaSession seems to be an
> > extra internal
> > session used by Magma (right?) and perhaps that
> > session is for some
> > reason also materializing the collections - which
> > would explain the
> > double amount (18 instead of 9).
> 
> Exactly right.  Magma has a meta-model that is
> maintained via its own transaction mechanism.  The
> meta-model includes such things as the
> class-definitions, the magma-collections and their
> indexes, the code-base for the repository, etc.  See
> MagmaRepositoryDefinition.  It is the root of the
> "meta side".

Aha. Nice. And good to know. :)

> When a new class-definition or large-collection is
> added, the server refreshes its own "internal" session
> because it must know about them to do its work
> properly.
> 
> > > No, this does not sound correct.  All
> > LargeCollections
> > > should be monitored as soon as they're persistent.
> >  If
> > > they're not persistent, changing keys doesn't
> > matter. 
> > > But again, I'm talking codeless here..
> > 
> > So a MagmaSession always "knows" all
> > MagmaCollections in a db,
> > regardless of if they have been navigated and
> > materialized in the
> > session yet?
> 
> Since all the MagmaCollections are part of the
> MagmaRepositoryDefinition (the meta root), and this
> definition is faulted down and materialized upon
> connect, the answer is yes, each connected
> MagmaSession always knows all MagmaCollections in a
> db.

Ok. Good. Now I have a much better "picture" of how this works. :)

> > Btw, in my app you can actually have the server
> > "build" a separate Magma
> > db, then download it, unzip and reconnect to it on
> > the clients locally -
> 
> Wow, you can tell me more about this?  This is
> obviously part of the "working offline" function,
> right?

Indeed. The master server has code to create a separate Magma db, then
does an intricate veryDeepCopy of the model, and excluding various parts
depending on the permissions of the user etc, and stores it in the new
db. The db is then zipped up and served out by KomHttpServer as a single
zip file. Then I use external calls to wget and unzip (because I expect
this db to possibly become quite large) from the client to get it down,
unpack etc. 

The neat part is that all this is done behind a Seaside UI so the user
simply logs on, choose a "mirror" and press "download" and voila - back
to the login screen, but now the client Seaside app has a partial mirror
of the master server db.

> This might be painful if you are planning to try to
> "merge" the offline work back into the "master" later.

Nope, not at all. :) All changes to the domain model are modelled using
the Command pattern - or as I like to call them "transactions" (not to
be confused with Magma transactions of course).

So all modifications to the model are funneled through the top object
which in turn creates instances of Q2Txn (with concrete subclasses for
each type of change), call them to do their work and then I store them
in a MagmaCollection.

So basically I should be able to nuke the model and rebuild it in full
by simply applying all those Q2Txn instances in sequence. Quite
Prevaylerish in style.

Now - this model comes into real play in the offline scenario - a client
simply first downloads all "unknown" Q2Txns, applies them (bringing the
local Magma db up to date), then uploads all local Q2Txns to be applied
at the master server.

I have all this working today - the Q2Txn instances are first
"disconnected" (using UUIDs instead of object refs) from the domain
objects, serialized using ReferenceStream and gzipped, then sent over as
a ByteArray using SOAP (which does base64 encoding I think) and then
rematerialized on the other side, reconnected in the new model and
"applied". Works like a charm.

And since I then have real objects for all operations I kind attach
specific conflict code to each kind of transaction object. So a little
bit of manual work - but it pays off. And in other ways too - like
having full complete logging and traceability of all changes - per
definition.

> I have planned, for 1.2, an efficient server-to-server
> protocol that will allow large chunks of domains to be
> transported between repositories without having to go
> through the client; and, further, to be able to "sync"
> up with the original repository.  I hope to have this
> done by summer.

Ok, sounds like very useful tech for us - but we can't wait for it. :)
But it might come in handy later on.

Our scenario is first a full download of a partial db done on the LAN
and then regular synchs (sending those Q2Txns back and forth) with quite
small data. And since the Q2Txns are only deltas they turn very small.

[SNIP] 
> So I gather you discovered you just need to connect
> with a new MagmaSession instance instead of trying to
> reuse the old one.

Indeed. No problem.

> > Hmmm, let me see now... above you are saying (I
> > guess) that only
> > monitored MagmaCollections will be reindexed. And
> > AFAICT from the code
> > the monitored collections are the ones we have
> > materialized in the
> > session. But above you wrote "All LargeCollections
> > should be monitored
> > as soon as they're persistent." which seems
> > contradictory.
> 
> This question is hopefully answered now (above).  All
> MagmaCollections in the db are monitored as soon as
> you connect because they're part of the meta
> RepositoryDef.  All newly craeted ones since the
> connect are monitored as soon as they become
> persistent via your commit.  Non-persistent
> collections with indices do not suffer from key-change
> side-effects.

Ok. Got it.

> > PS. Very happy with Magma so far. :) And yes, the
> > second demo the other
> > day went fine and we have a GO for the project! And
> > most likely we will
> > open source it too.
> 
> Fantastic!  Someday I hope my Java-Oracle cohorts will
> at least *listen* to an alternative for five-minutes
> without smirk and ridicule (about which they know
> NOTHING).  In the meantime, we spend hours and
> hundreds of e-mails every day toiling over
> column-lengths, types, slow-BLOBs and CLOBs,
> constraint order, naming-abbreviation "standards", DBA
> fights, etc. etc.  Blecch!

Hehe, yes indeed. A sidenote:

I ran a 2-hour workshop yesterday with 8 other employees at Toolkit
(where I work).
It was a "Shock and Awe"-workshop throwing them right into a stripped
version of my customer app - focusing mainly on Seaside but with Magma
inside too of course.

One of the fun parts is that with the Seaside/Magma integration and my
bits and pieces already in place they never ever saw a single line
related to the db.

One pair of developers added instvars in the domain model, created
objects per user object in the model, yaddayadda - and it "just worked".
Even if they actually know a bit about OODBs I still think they were a
bit mesmerized. I mean - hey, they didn't write a single line of code
for it - not even a "commit".

>  - Chris

regards, Göran