Magma on 3.7 report

Sat Mar 25 20:12:06 UTC 2006

Now that the performance regression is restored, let me try responding properly.

 > In the attached tally, Magma spends 97.3% (!) of the time in
 > removeGarbageCollectedObjectEntries.

 Magma maintains a two-way map, object->oid (a.k.a. "oids") and oid->object (a.k.a. "objects").  It currently handles this with a WeakIdentityKeyDictionary and a MaWeakValueDictionary, respectively.  WeakKey dictionaries automatically remove their (key->value) entries when objects are collected/finalized, but WeakValue dictionaries entries remain, and simply reference nil (i.e., oid->nil).

 So, as the client moves to explore a other parts of a huge domain model, the WeakKey dictionary is fine but the WeakValue never shrinks unless Magma cleans it up manually.  This is what removeGarbageCollectedObjectEntries does.  It simply rebuilds the entire MaWeakValueDictionary by enumerating all and only keeping the non-nil values.

 The best strategy for when to rebuild is difficult to generalize.  Time-based could cause unnecessary rebuilds or not rebuild soon enough, so I made based on conditions.  Whenever the 'objects' size is twice that of the 'oids' size, the next oid assignment will take a detour rebuild the "objects" collection (oid->object), removing all the nil entries.

 >> However, there is one central report touching almost the whole model,
 >> which I had to interrupt after a quarter of an hour. It was taking
 >> way too long., I guess Magma was bringing in objects and throwing
 >> them out over and over again. I might easily be wrong though.

 So, if this is true, bringing in a huge amount of objects and then throwing them away, would explain poor performance because not only are these two Dictionarys are working overtime, the rebuild would be required on a frequent basis, which seems to be what the MessageTally indicated.

 > It's basically a dictionary of (SSProject) objects each possessing a
 > dictionary of (SSVersion) objects, whose attributes are also needed
 > for that listing. This is around 6K objects times the attributes,
 > which are complex themselves ... it really is touching like 50K or
 > even 100K objects, and not only once but iterating several times.
 > Which is absolutely no problem if everything is in the image.

 Most all-in-memory programs are able to leverage the fast, direct loading and saving of Squeak objects in their native format via ImageSegments.  But Magma programs are concerned with multiple users touching 15 objects here, 20 objects there..  The server must work with the object model in its serialized state, in a fine-grained way.  While I tried to make faulting huge chunks of objects as fast as possible, due to the aforementioned Dictionary's which must be populated, it is many times slower than ImageSegments for this type of operation.

 So, generally Magma programs are at their best under the idea of presenting "one screen at a time" worth of objects to the user and throwing away (i.e., not referencing) objects that are not needed for that screen.  The tools available to do this are:

   - ReadStrategy's : allow you to specify SSVersion objects and all of their complex-object attributes needed for display to be brought back in one single db call.  This can make a significant performance difference vs. hitting a proxy on every row you're trying to display.  ReadStrategy's are easy to use.

   - MagmaCollections : allow you to only bring back one page of SSVersion objects instead of all 6000.  This is done completely transparently via #at:.  MagmaCollections support the indexing needed to sort or range-search by any column.

 These two together allow only exactly what is needed to display a page of SSVersions to be brought in as needed, resulting in huge performance gains.

 Equally important, how much to keep in memory by hard-referencing or not.  Hard referencing the root for example will cause all parts of the database explored to remain in memory, growing and growing endlessly.  This may be tempting to do so that objects don't have to be "reread" but it also causes the dictionary's to get very big and slow.  Were it not for MaWeakValueDictionary replacing WeakValueDictionary, the "objects" dictionary becomes unusable after just a couple hundred-thousand entries or so..

 Keeping a big in-memory footprint also, unless using WriteBarrier, can slow down the commit-rate because objects are compared against their buffer.  It is a "logical" compare, not a MD5 or serialized-buffer compare because otherwise there are too many false-differences, particularly with hashed collections which serialize differently even if there were no changes.

 Management of the size of the in-memory domain model is also helped by #stubOut:.  Magma does not call this itself, it is a tool for the developers discretion.  It involves a becomeForward: so the best thing to do is, for example, stub the single Collection of 6000 objects, not each of the 6000.  I don't think multiple Seaside sessions referencing an object you stubOut: should be a problem although may be good idea to guard it with a Mutex.

 ===

 Personally, I encourage using these tools and this lean approach..  

 Having said all this, though, your "straight port" to Magma, from the all-in-memory approach may still be workable if you can endure the initial "load time" of the model into memory (90% of which is probably building those darn dictionary's).  Just be sure to KEEP it all in memory after its gone through all the trouble..  :)  Commits will probably slow down so if they get too unbearable you could always turn on WriteBarrier.

  - Chris