Hi Stephen, let me see if I can answer your questions:
I notice that you are assigning oids to every object that is stored in
Magma and using a WeakKeyIdentityDictionary for storing the oids. One issue I had with that approach is that I think it runs into scalability issues when you approach and surpass 4096 objects (due to the number of bits available for the identity hash). Is there a way to make this scheme more scalable? Or, is it possible that it will be rare to have more than 4000 persistent objects cached on the client?
A: I don't think it would be "rare" to have more than 4K objects in a medium-sized program. However, it was written with the expectation that identityHash will be fast. If there is a scalability problem w/ weakIdentityDictionaries greater than 4K in size, then there may be a scalability issue w/ Magma.
======
How are you tracking the changed objects? A: Take a look at MaTransaction>>markRead:. MaTransaction maintains an IdentityDictionary whose keys are the read object, values are a shallowCopy "backup". When you commit, it zips through the keys and values and does an identity compare of each objects variables (see implementors of maIsChangedFrom:). This may seem like a lot of work, but its actually suprisingly fast. Additionally, I prefer this "copy and compare" method of change detection because it offer the most transparency.
========
It takes a long time to establish a MagmaSession (especially after some objects have been populated in the server)...can you describe what's happening when connecting? A: Hmm... I've never noticed that taking a long time. If you look at the MamgaSession>>connectAs:maximumNumberOfChallengers: it summarizes what happens upon connection. Basically, we have to get the classId's and definitions of all the classes that are known to the repository. The current class definition defined in Magma has to match what is in your image or it won't let you connect.
I haven't posted the Swiki page yet about how to tell Magma to "upgrade" a class to a new version, but you can see examples in MagmaTestCase.
==========
I see that you are using a stubbing mechanism (ala GemStone) that uses a ProtoObject and #doesNotUnderstand: to transparently forward messages to the real object. Are you also using #become: to change these objects into their real counterpart? If so, won't this present a performance issue under certain circumstances (where one or both of the objects are in old space)? Also, did you implement a "stubbing level" mechanism ala GemStone? A: I use becomeForward:. I've not noticed any performance issues in my "small" volume tests. If this causes a performance problem due to them being in oldspace, do you have an alternative?
========
Is there any kind of cache control in Magma? For example, if I have a client that is running for many weeks and accessing lot's of objects, once they are pulled from the server to the client, are they going to stay in the client indefinitely? Is there some way of controlling how many objects are retained in the client's object memory?
A: The only "caching" Magma does is in weak collections, so there shouldn't be anything that you don't cache yourself.
Regards, Chris
__________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
On Fri, 2 Aug 2002, Chris Muller wrote:
Is there any kind of cache control in Magma? For example, if I have a client that is running for many weeks and accessing lot's of objects, once they are pulled from the server to the client, are they going to stay in the client indefinitely? Is there some way of controlling how many objects are retained in the client's object memory?
A: The only "caching" Magma does is in weak collections, so there shouldn't be anything that you don't cache yourself.
Right, but once a stub gets replaced with a real object, the real object will never get replaced back with a stub, correct? Which means that, since the session has a reference to the root, and the root has a reference to every other object in the database, once an object is pulled into the client it will never be garbage collected unless it is removed from the database. This can be a problem for long running clients. I think what Stephen is suggesting is that, for example, you periodically becomeForward: objects back into stubs that have not been used for a certain length of time.
Chris,
Chris Muller wrote:
Hi Stephen, let me see if I can answer your questions:
I notice that you are assigning oids to every object that
is stored in Magma and using a WeakKeyIdentityDictionary for storing the oids. One issue I had with that approach is that I think it runs into scalability issues when you approach and surpass 4096 objects (due to the number of bits available for the identity hash). Is there a way to make this scheme more scalable? Or, is it possible that it will be rare to have more than 4000 persistent objects cached on the client?
A: I don't think it would be "rare" to have more than 4K objects in a medium-sized program. However, it was written with the expectation that identityHash will be fast. If there is a scalability problem w/ weakIdentityDictionaries greater than 4K in size, then there may be a scalability issue w/ Magma.
The identity hash in Squeak is only 12 bits. There has been a lot of discussion in the past regarding how to improve this...you might be able to get better scalability by spreading out the hash (and perhaps subclass WeakKeyIdentityDictionary)...you'd still get the same amount of collisions, but presumably, you wouldn't have to scan nearly as far to get a match or an empty slot.
======
How are you tracking the changed objects? A: Take a look at MaTransaction>>markRead:. MaTransaction maintains an IdentityDictionary whose keys are the read object, values are a shallowCopy "backup". When you commit, it zips through the keys and values and does an identity compare of each objects variables (see implementors of maIsChangedFrom:). This may seem like a lot of work, but its actually suprisingly fast. Additionally, I prefer this "copy and compare" method of change detection because it offer the most transparency.
After I wrote the email, I discovered this...very clever! Since you only need to do the scan when you commit, it's not a horrible price to pay. I was baffled by how Magma was able to detect changes in brand new instances I was creating...I even tried to fool it, but it worked like a charm. It looks new objects get stored (during commit) when they are referenced from other objects that are already in the db...when they are stored, you then record the shallowCopy backup...that way the object is part of the scan to detect changes. Is that correct?
========
It takes a long time to establish a MagmaSession (especially after some objects have been populated in the server)...can you describe what's happening when connecting? A: Hmm... I've never noticed that taking a long time. If you look at the MamgaSession>>connectAs:maximumNumberOfChallengers: it summarizes what MamgaSession>>happens upon connection. Basically, we have to get the classId's and definitions of all the classes that are known to the repository. The current class definition defined in Magma has to match what is in your image or it won't let you connect.
I haven't posted the Swiki page yet about how to tell Magma to "upgrade" a class to a new version, but you can see examples in MagmaTestCase.
==========
I see that you are using a stubbing mechanism (ala GemStone) that uses a ProtoObject and #doesNotUnderstand: to transparently forward messages to the real object. Are you also using #become: to change these objects into their real counterpart? If so, won't this present a performance issue under certain circumstances (where one or both of the objects are in old space)? Also, did you implement a "stubbing level" mechanism ala GemStone? A: I use becomeForward:. I've not noticed any performance issues in my "small" volume tests. If this causes a performance problem due to them being in oldspace, do you have an alternative?
Unfortunately no...I've always been leary about solutions that require a #become: (or #becomeForward:) due to the potential performance issue. Squeak has a direct mapped memory model, which means that if you swap the identities of two objects, you must scan all of memory (unless you're dealing with two young objects). You can avoid some of the performance issues if you can bunch up a lot of objects (this is why Squeak's #become is actually based on Array>>elementsExchangeIdentityWith:). I don't see a way of bunching up a lot of stubs though.
Perhaps I've been a little too quick to dismiss #become: though...if you examine normal usage patterns, it's probably the case that the vast majority of your #become: calls are happening with two young objects...which is fast.
========
Is there any kind of cache control in Magma? For example, if I have a client that is running for many weeks and accessing lot's of objects, once they are pulled from the server to the client, are they going to stay in the client indefinitely? Is there some way of controlling how many objects are retained in the client's object memory?
A: The only "caching" Magma does is in weak collections, so there shouldn't be anything that you don't cache yourself.
Avi Bryant wrote: "Right, but once a stub gets replaced with a real object, the real object will never get replaced back with a stub, correct? Which means that, since the session has a reference to the root, and the root has a reference to every other object in the database, once an object is pulled into the client it will never be garbage collected unless it is removed from the database. This can be a problem for long running clients. I think what Stephen is suggesting is that, for example, you periodically becomeForward: objects back into stubs that have not been used for a certain length of time."
Avi is summarizes it correctly...the issue I see is that you can eventually get the whole database in memory (if your application runs long enough and works with enough of your dataset). One solution might be to simply restart the image when it grows beyond a certain threshold (I do that right now with swiki.net). Here's a caching design I'm working on:
The algorithm is an approximate LRU and works by counting the number of dereferences (when an object is actually retrieved from the db) and object stores (when an object is saved). Each cached object (actually an object cluster in my case) has an age. After a threshold is reached (say 1000 derefs), I scan through the cached objects (actually clusters) incrementing the age of relatively young references, and flushing the objects whose references have reached a maximum age (say 5). When an object is dereferenced, the age gets reset to 0. So, the age actually tells us how many aging scans have occurred since the last time the cluster was dereferenced. Thus, older objects have not been accessed in a while (note: I chose not to implement transparent proxies...you access your cached objects by holding onto an PdbObjectReference, which holds the root of a cached cluster of objects...thus, you must always access your cached objects by sending the #deref message to a PdbObjectReference). Also, I have to check all of the process stacks to make sure that an object is not current being accessed before flushing it (probably a rare occurrence anyway).
I'm thinking that this type of caching might work well in practice. It doesn't set any hard limits on the number of cached objects...so, if you have a very large and active working set, they should all stay young and stay in memory. A possible improvement might be to add a hard limit on the number of cached objects and have have a deref trigger a scan (in addition to the deref counting) if that deref would cause the limit to be exceeded.
- Stephen
On Fri, 2 Aug 2002, Stephen Pair wrote:
Since you only need to do the scan when you commit, it's not a horrible price to pay. I was baffled by how Magma was able to detect changes in brand new instances I was creating...I even tried to fool it, but it worked like a charm.
This seems like it would also be a problem for long running sessions - as you bring in more and more objects, the scan will take longer and longer.
Unfortunately no...I've always been leary about solutions that require a #become: (or #becomeForward:) due to the potential performance issue. Squeak has a direct mapped memory model, which means that if you swap the identities of two objects, you must scan all of memory (unless you're dealing with two young objects). You can avoid some of the performance issues if you can bunch up a lot of objects (this is why Squeak's #become is actually based on Array>>elementsExchangeIdentityWith:). I don't see a way of bunching up a lot of stubs though.
Depending on the protocol, you often bring in clusters of objects at once from the server (I think Magma calls this the "read strategy", and you can modify it). So these can all be becomeForwarded at once.
The easiest way around #becomeForward: is to simply keep the proxy around and have it forward all its messages to the real object. When I was playing with a GOODS client I actually had #becomeForward: in the other direction, in that whenever you did a commit it would replace all of the new objects with proxies that pointed to them. The proxy was a sort of "object manager" that tracked which objects were touched by the transaction, had methods for lock management, and so on.
Actually, with the recent interest in object databases I'm thinking about reviving that GOODS client... if anyone would be really interested in it let me know.
Avi
Avi wrote:
Depending on the protocol, you often bring in clusters of objects at once from the server (I think Magma calls this the "read strategy", and you can modify it). So these can all be becomeForwarded at once.
Yes.
The easiest way around #becomeForward: is to simply keep the proxy around and have it forward all its messages to the real object. When I was playing with a GOODS client I actually had #becomeForward: in the other direction, in that whenever you did a commit it would replace all of the new objects with proxies that pointed to them. The proxy was a sort of "object manager" that tracked which objects were touched by the transaction, had methods for lock management, and so on.
In the OODB that I'm working on, I'm trying to avoid any use of #doesNotUnderstand: and identity trickery. In the past I've tried to make things as transparent as possible, but I'm starting to believe that strategy is not as useful as it might seem. While I haven't come to any firm conclusions on the issue, my reasoning is this: in my experience, even when things are very transparent, you really still need to understand how things are working below the covers in order to be effective, and trying to make things automatic actually works against that understanding. Also, doing a lot of #dnu and identity tricks make things difficult to debug.
So...given my current thinking on this issue, I'm taking an approach that not only keeps the proxies around, but the proxies don't even forward messages with #dnu. Actually, I guess that means they're not really proxies.
The way it works is the OODB only creates PIDs (persistent ids) for clusters of objects. These clusters are accessed through instances of PdbObjectReference (which contains the PID, and some cache related info). You send the message #deref to access a cluster behind a PdbObjectReference and you only hold the PdbObjectReference in your domain...so, for example, if you have a Person that holds an Address, you have a decision to make...do you store the Address with the Person in a single cluster, or is address going to be the root of its own cluster. If you choose to make Address a part of the Person cluster, you design your instvars and accessing methods as normal...if however, you want it to be an independent cluster, you would write your address accessing methods something like:
---- Person>>address
^addressRef deref
---- Person>>addressRef: anAddressRef
addressRef := anAddressRef
---- Person>>addressRef
^addressRef ----
You might also choose to have an address setter that will assign a reference:
---- Person>>address: anAddress
addressRef := anAddress ref ----
...but this only works if you have a back pointer (or dictionary) where you can lookup or assign a PdbObjectReference. It also depends on having some contextual information about which PdbPersistentMemory is currently in use.
Then, when you save a Person, the cluster will be clamped at the addressRef (the serialization stops when instances of PdbObjectReference are encountered)...if a related PdbObjectReference has never been stored, then the persistence will cascade, making sure that all referenced objects get stored.
I only do a few transparent like things in this scheme...for example...PdbObjectReference overrides #inspect and #explore such that it actually inspects or explores its target. Also, #printOn: writes '-->>' followed by the printOn: of the target (or '(object on disk)' if it's not cached)...this gives you a visual clue in the inspectors that you're holding a PdbObjectReference, but it when you double click to inspect an instvar holding a reference, you actually inspect the target of the reference (you can use #basicInspect if you need to actually get the PdbObjectReference). This seems to be a nice balance because it makes it easy to dive around a domain, while not making things too transparent.
Actually, with the recent interest in object databases I'm thinking about reviving that GOODS client... if anyone would be really interested in it let me know.
The more OODBs the merrier.
- Stephen
squeak-dev@lists.squeakfoundation.org