Hi all,
I'm trying to make a solution for the following problem:
"Minimize the impact of the software you make in a future change of the ODB schema."
Until now, I've found this specifical features needed (perhaps you can review/add more):
Requeriment: To have solved in one place a big ammount of persisted objects, that could be accessed efficiently by different "attributes" (supporting wilcards), in a simultaneous, sustainable, transparent and easily maintainable way. All this together.
1. Big ammount of persistent objects. At first view any collection could have them, but as they are to be persistent, they need to be on an apropiated support. A BTree or TSTree satisfies this requirement.
2. Accessed efficiently. Those trees access their content very efficiently. Specially TSTree for wilcarded text searches.
3. Different attributes. For each attribute that the persistent objects is wanted to be searcheable, it's needed one of those trees. Let me call this indexing. So, this trees wil have in their keys, the values (probably restricted only to integers and strings) of the "attribute" of the objects stored, and in their values the objects itself. A tree for each indexed "attribute". This collection of trees may have the know how to maintain them and hold them. I called it IndexedCollection.
4. Wildcards search support. The TSTree supports wilcarded strings searches. Perhaps BTree too, in near future. In OmniBase, any ODBTree also support this.
5. In simultaneous way. This IndexedCollection may have a friendly protocol for adding index, removing index, and querying on any index at any time.
6. In Sustainable way. The changes of an element of the collection, should be updated in the respective index if matters (the update problem). In other words: the index should allways have it's keys synchronized with the objects attributes values.
7. In transparent way. That the elements should not perceive that they are persistent. I mean, they should not need to have any method nor code to update the indexes or anything related to the persistent schema.
8. Easily maintainable. That one can add or remove indexes as one needs.
9. All together. To have all this togethter in one place: a special collection that support all this features at one time, here I called it IndexedCollection.
If you know Magma I'm talking about serveral of the MagmaCollection's features, plus some others like having independence of the persistent schema itself. It's some kind of generalization that will be useful to you to develop software that need persistence, minimizing the impact of (for ANY reason) a change of the odb schema/vendor.
The main problem right now it's the update problem (point 6). When anyone changes the value of an indexed attribute of a persistent object, it should exist some kind of triggering mechanism, so the IndexedCollection could tell to the corresponding index, to update it's key and value corresponding to the mutating element.
Hypothesis: what about making some kind of proxy or wrapper to every element you add to this collection, to monitor (via DNU) every message sent to the monitored element, and when appropiate, trigger the change of the target attributes, so the collection could update it's indexes?
In that case, we have to solve the additional problem of the change of domain: the ODB domain to VM domain, and vice versa. That will drop the identity of the objects, including elements, wrappers, the hooked events, and the collection itself. So here is the problem. I'm thinking about it for several days without enlightenment yet. Any idea to solve this? Any better idea/strategy?
best regards,
Sebastián Sastre HYPERLINK "mailto:ssastre@seaswork.com.ar"ssastre@seaswork.com.ar HYPERLINK "http://www.seaswork.com.ar/%22www.seaswork.com.ar
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.767 / Virus Database: 514 - Release Date: 21/09/2004
On Sep 25, 2004, at 3:45 PM, Sebastian Sastre wrote:
The main problem right now it's the update problem (point 6). When anyone changes the value of an indexed attribute of a persistent object, it should exist some kind of triggering mechanism, so the IndexedCollection could tell to the corresponding index, to update it's key and value corresponding to the mutating element. Hypothesis: what about making some kind of proxy or wrapper to every element you add to this collection, to monitor (via DNU) every message sent to the monitored element, and when appropiate, trigger the change of the target attributes, so the collection could update it's indexes?
There are a couple of problems with this - first, it's impossible to transparently ensure that a bare reference to the monitored object won't "escape", so that messages can be sent directly to it. Second, how can you tell when a message mutates the object and when it doesn't? You could keep a snapshot of the object and compare every time it got sent a message, but that's expensive. If you're going to do that at all, you only want to do it for methods that might mutate the object.
Better, I think, would be to use a MethodWrappers (or ObjectsAsMethods, in Squeak) approach where you're passing around pointers to the actual object, but you modify its class (or a special subclass of its class) to have wrapped CompiledMethods in its methodDict. You only need to wrap methods that set inst vars, which are easy to find by scanning the bytecode. You only pay the cost on sends to those specific methods. The wrappers could do some kind of before/after comparison to see if the data actually changed, or you could just decide to always broadcast a change when a method is called that might have made one.
Don't we have a class-change prim now? It used to require a #become:, which would be very expensive. But if we can just swizzle the class pointer this could be a pretty decent approach.
I should also say that Stef has a good paper about this kind of stuff that I hope he'll post a link to.
Avi
Don't we have a class-change prim now? It used to require a #become:, which would be very expensive. But if we can just swizzle the class pointer this could be a pretty decent approach. I should also say that Stef has a good paper about this kind of stuff that I hope he'll post a link to.
:)
http://www.iam.unibe.ch/~scg/Archive/Papers/Duca99aMsgPassingControl.pdf But take care what is described in the paper works for VW and some tricks do not work for Squeak. In particular changeClass: is reallllllllyyyyyyy dangerous in Squeak (even with class having the same format we can crash an image in no time I did not tried a lot but I could not have a code working with changeClass: involved.)
But the paper is worth reading. Stef
Avi
There are a couple of problems with this - first, it's impossible to transparently ensure that a bare reference to the monitored object won't "escape", so that messages can be sent directly to it. Second, how can you tell when a message mutates the object and when it doesn't?
Yes I know, so I'm considering to solve those problems by a less general approach: making a restriction. This restriction could be expensive to keep the update cheap enoguh.
The restriction should be for the proxys/wrapper:
"They should trigger a message, only when the element receives a message with 1 argument or more."
In this way, when you make the IndexedCollection, you can configure it to be aware of the *configured* messages to make the update. I'm aware that is more *manual*, less *intelligent*, but usefull and flexible enough.
For example:
idxCol addIndex:( Index new attribute:#name; updateWhenElementReceives:#name: ; updateWhenElementReceives:#firstName: ).
Additionally, in real use, you can use the indexedCollection in a way it only has indexes where they keys are String or Integers. I can't see what else could be interesting to be holded in the keys. This numbers or text can be holded on an element's instVar or in a deeper composition. In this last case you should put the accesors in the element, configure the collection to #updateWhenElementReceives:#blah: and you're done.
What do you think?
Anyway, with this approach, there still a problem in the domain change that kills the hooked events (due to the identity lost) to catch the triggers of the proxies. Any idea on how to solve this?
Regards,
Sebastián Sastre ssastre@seaswork.com.ar www.seaswork.com.ar
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.767 / Virus Database: 514 - Release Date: 21/09/2004
Avi,
after a detailed lecture of Stef's paper about controlling method passing, I'd have to agree with you in the MethodWrappers thecnique subject, as being the best approach for this case.
By the way, Stef.. What a nice work you have done !
So... I'm a little lost right now, but I'll study deeper this MethodWrappers framework making experiments to see how much they already are applicable.
In your previous email you said that the system can know wich methods modify a defined instVar by scanning the bytecode. As I never make somethig like that I ask you to explain me a little more about this.
thank you,
regards,
Sebastián Sastre ssastre@seaswork.com.ar www.seaswork.com.ar
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Avi Bryant Enviado el: Sábado, 25 de Septiembre de 2004 14:30 Para: The general-purpose Squeak developers list Asunto: Re: Relative independence of odbs: IndexedCollection,the update problem
On Sep 25, 2004, at 3:45 PM, Sebastian Sastre wrote:
The main problem right now it's the update problem
(point 6). When
anyone changes the value of an indexed attribute of a persistent object, it should exist some kind of triggering mechanism, so the IndexedCollection could tell to the corresponding index, to update it's key and value corresponding to the mutating element. Hypothesis: what about making some kind of proxy or wrapper to every element you add to this collection, to monitor (via
DNU) every
message sent to the monitored element, and when appropiate, trigger the change of the target attributes, so the collection could update it's indexes?
There are a couple of problems with this - first, it's impossible to transparently ensure that a bare reference to the monitored object won't "escape", so that messages can be sent directly to it. Second, how can you tell when a message mutates the object and when it doesn't? You could keep a snapshot of the object and compare every time it got sent a message, but that's expensive. If you're going to do that at all, you only want to do it for methods that might mutate the object.
Better, I think, would be to use a MethodWrappers (or ObjectsAsMethods, in Squeak) approach where you're passing around pointers to the actual object, but you modify its class (or a special subclass of its class) to have wrapped CompiledMethods in its methodDict. You only need to wrap methods that set inst vars, which are easy to find by scanning the bytecode. You only pay the cost on sends to those specific methods. The wrappers could do some kind of before/after comparison to see if the data actually changed, or you could just decide to always broadcast a change when a method is called that might have made one.
Don't we have a class-change prim now? It used to require a #become:, which would be very expensive. But if we can just swizzle the class pointer this could be a pretty decent approach.
I should also say that Stef has a good paper about this kind of stuff that I hope he'll post a link to.
Avi
Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.767 / Virus Database: 514 - Release Date: 21/09/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.771 / Virus Database: 518 - Release Date: 28/09/2004
On Sep 30, 2004, at 10:15 PM, Sebastian Sastre wrote:
In your previous email you said that the system can know wich methods modify a defined instVar by scanning the bytecode. As I never make somethig like that I ask you to explain me a little more about this.
Probably the easiest is to use Behavior>>whichSelectorsStoreInto:. It only does the search for a specific inst var, but you can either send it once for each inst var in the class, or modify it to search for any stores (which would be faster).
Avi
Avi,
that helped !... Comment: speed is not critical here because it has to be sent only at index creation time (once).
By the way, this indexes are in fact HomogeneousIndex, because the nature of the method wrappers that do not tolerate heterogenity. Because of that I've made a IndexedCollection subclass called HomogeneousIndexedCollection. This will be more efficient (I hope) but it only can store objects of the same class. The intention is to also have the IndexedCollection (who uses a proxy, minimal object, DNU, etc approach) with some lesser efficience but tolerating heterogenity.
I put the methodwrapper in the index, so the lifetime of the wrapper will be in sync with the lifetime of the index who want's to be arawe of that method call. I made a Registry class var in the indexes so they can receive a #finalize call when garbage collected, so they can uninstall the wrapper before disapear. I'm not completely convinced of this approach. Why they are not GC at once, and the wrapper uninstalled, when a workspace script variable that is the only who knows the collection is set to nil? (they actually do but I only see it after sending Smalltalk garbageCollect) What do you think? When do you think the uninstallation of the methodWrappers should be done? There are others/best approach you can see?
best regards,
Sebastián Sastre
ssastre@seaswork.com.ar Seaswork Special Software Solutions www.seaswork.com.ar
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Avi Bryant Enviado el: Jueves, 30 de Septiembre de 2004 17:53 Para: The general-purpose Squeak developers list Asunto: Re: Relative independence of odbs: IndexedCollection,the update problem
On Sep 30, 2004, at 10:15 PM, Sebastian Sastre wrote:
In your previous email you said that the system can know wich
methods
modify a defined instVar by scanning the bytecode. As I never make somethig like that I ask you to explain me a little more about this.
Probably the easiest is to use Behavior>>whichSelectorsStoreInto:. It only does the search for a specific inst var, but you can either send it once for each inst var in the class, or modify it to search for any stores (which would be faster).
Avi
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
Avi,
I forgot to ask you two things:
in one hand, that I don't see a #removeKey: in the TSTree. It can have it? in the other hand, what about the Btree that can search it's keys with wilcards (like #matchesForPrefix: for TSTree)?
best regards,
Sebastián Sastre
ssastre@seaswork.com.ar Seaswork Special Software Solutions www.seaswork.com.ar
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Sebastián Sastre Enviado el: Miércoles, 06 de Octubre de 2004 17:22 Para: 'The general-purpose Squeak developers list' Asunto: RE: Relative independence of odbs: IndexedCollection,the update problem
Avi,
that helped !... Comment: speed is not critical here because it has to be sent only at index creation time (once).
By the way, this indexes are in fact HomogeneousIndex, because the nature of the method wrappers that do not tolerate heterogenity. Because of that I've made a IndexedCollection subclass called HomogeneousIndexedCollection. This will be more efficient (I hope) but it only can store objects of the same class. The intention is to also have the IndexedCollection (who uses a proxy, minimal object, DNU, etc approach) with some lesser efficience but tolerating heterogenity.
I put the methodwrapper in the index, so the lifetime of the wrapper will be in sync with the lifetime of the index who want's to be arawe of that method call. I made a Registry class var in the indexes so they can receive a #finalize call when garbage collected, so they can uninstall the wrapper before disapear. I'm not completely convinced of this approach. Why they are not GC at once, and the wrapper uninstalled, when a workspace script variable that is the only who knows the collection is set to nil? (they actually do but I only see it after sending Smalltalk garbageCollect) What do you think? When do you think the uninstallation of the methodWrappers should be done? There are others/best approach you can see?
best regards,
Sebastián Sastre
ssastre@seaswork.com.ar Seaswork Special Software Solutions www.seaswork.com.ar
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Avi Bryant Enviado el: Jueves, 30 de Septiembre de 2004 17:53 Para: The general-purpose Squeak developers list Asunto: Re: Relative independence of odbs: IndexedCollection,the update problem
On Sep 30, 2004, at 10:15 PM, Sebastian Sastre wrote:
In your previous email you said that the system can know wich
methods
modify a defined instVar by scanning the bytecode. As I never make somethig like that I ask you to explain me a little more about this.
Probably the easiest is to use Behavior>>whichSelectorsStoreInto:. It only does the search for a specific inst var, but you can either send it once for each inst var in the class, or modify it to search for any stores (which would be faster).
Avi
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
On Oct 6, 2004, at 10:33 PM, Sebastián Sastre wrote:
in one hand, that I don't see a #removeKey: in the TSTree. It can have it?
Yup. Just haven't needed it yet, so haven't implemented it.
in the other hand, what about the Btree that can search it's keys with wilcards (like #matchesForPrefix: for TSTree)?
Yeah, I haven't gotten around to that either. Though for a BTree I'd expect a range search rather than wildcards...
Either of those things should be reasonably easy for someone else to implement if they wanted them (I don't think my code's *that* illegible). I doubt I'll get to either one any time very soon, since the current functionality seems to be working for what I need - so if you want it, best would be to add it yourself. If the details of the existing code are confusing, give me a shout.
Cheers, Avi
OK, so I'll work on the IndexedCollection strictly until the update of the indices are needed. So if I need it, I'll ask you for clues !
Thank you,
Sebastián Sastre
ssastre@seaswork.com.ar Seaswork Special Software Solutions www.seaswork.com.ar
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Avi Bryant Enviado el: Miércoles, 06 de Octubre de 2004 20:52 Para: The general-purpose Squeak developers list Asunto: Re: Relative independence of odbs: IndexedCollection,the update problem
On Oct 6, 2004, at 10:33 PM, Sebastián Sastre wrote:
in one hand, that I don't see a #removeKey: in the TSTree. It
can
have it?
Yup. Just haven't needed it yet, so haven't implemented it.
in the other hand, what about the Btree that can search it's
keys
with wilcards (like #matchesForPrefix: for TSTree)?
Yeah, I haven't gotten around to that either. Though for a BTree I'd expect a range search rather than wildcards...
Either of those things should be reasonably easy for someone else to implement if they wanted them (I don't think my code's *that* illegible). I doubt I'll get to either one any time very soon, since the current functionality seems to be working for what I need - so if you want it, best would be to add it yourself. If the details of the existing code are confusing, give me a shout.
Cheers, Avi
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
On Oct 6, 2004, at 10:33 PM, Sebastián Sastre wrote:
in the other hand, what about the Btree that can search it's keys with wilcards (like #matchesForPrefix: for TSTree)?
Ok, I had a couple of minutes to add #from:to:do: and #from:to:keysAndValuesDo: to the BTree implementation. The new release is on SqueakMap.
Avi
On Oct 6, 2004, at 10:21 PM, Sebastián Sastre wrote:
I put the methodwrapper in the index, so the lifetime of the wrapper will be in sync with the lifetime of the index who want's to be arawe of that method call. I made a Registry class var in the indexes so they can receive a #finalize call when garbage collected, so they can uninstall the wrapper before disapear. I'm not completely convinced of this approach. Why they are not GC at once, and the wrapper uninstalled, when a workspace script variable that is the only who knows the collection is set to nil? (they actually do but I only see it after sending Smalltalk garbageCollect) What do you think? When do you think the uninstallation of the methodWrappers should be done?
I'm not sure I understand - do you expect there ever to be a time when you don't have at least one index in your image? As long as there's at least one (for any given domain class), won't you have to still have the MethodWrappers enabled? It seems unlikely to me that, either in development or in production, the uninstallation case would ever naturally occur...
There are others/best approach you can see?
Well, I've actually been working on a different approach today, that I expect I'll probably release in the next couple of days. It's intended to be used as a write barrier for GOODS (or other OODB clients), so it has different constraints from your homogenous indices. I need to be able to detect writes on pretty much any class of object, which means that I can't afford to use MethodWrappers, since they're image wide; trapping writes to every Array in the system, for example, would not be a terribly good idea.... Instead, I'm using programatically built subclasses that override any potentially mutating methods, and trigger a notification if a mutation actually occurs, and I selectively change the class of the instances I'm interested in (using #primitiveChangeClassTo:) to use these special subclasses.
Anyway, it might be interesting for you to compare this implementation with what you've done with MethodWrappers. I'll give more details when I release the code.
Avi
Avi,
I don't mean to uninstall the package. If you see the ObjectAsMethodWrapper class, it has two methods: #install and #uninstall. This methods actually make the replacement of the CompiledMethod by the wrapper and viceversa. So, when you have the indexed collection arround, the classes will have their methods wrapped, but when the indexedCollection is gone, you want those classes to be normal again (uninstalling the method(s) that wraps it's original compiled method(s)).
About you automatic subclass approach, sounds good. You will make it dependent or independent from GOODS client?
regards,
Sebastián Sastre
ssastre@seaswork.com.ar Seaswork Special Software Solutions www.seaswork.com.ar
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Avi Bryant Enviado el: Miércoles, 06 de Octubre de 2004 20:48 Para: The general-purpose Squeak developers list Asunto: Re: Relative independence of odbs: IndexedCollection,the update problem
On Oct 6, 2004, at 10:21 PM, Sebastián Sastre wrote:
I put the methodwrapper in the index, so the lifetime of the
wrapper
will be in sync with the lifetime of the index who want's to be arawe of that method call. I made a Registry class var in the indexes so they can receive a #finalize call when garbage collected, so they can uninstall the wrapper before disapear. I'm not completely convinced of this approach. Why they are not GC at once, and the wrapper uninstalled, when a workspace script variable that is the only who knows the collection is set to nil? (they actually do but I only see it after sending Smalltalk garbageCollect) What do you think? When do you think the uninstallation of the methodWrappers should be done?
I'm not sure I understand - do you expect there ever to be a time when you don't have at least one index in your image? As long as there's at least one (for any given domain class), won't you have to still have the MethodWrappers enabled? It seems unlikely to me that, either in development or in production, the uninstallation case would ever naturally occur...
There are others/best approach you can see?
Well, I've actually been working on a different approach today, that I expect I'll probably release in the next couple of days. It's intended to be used as a write barrier for GOODS (or other OODB clients), so it has different constraints from your homogenous indices. I need to be able to detect writes on pretty much any class of object, which means that I can't afford to use MethodWrappers, since they're image wide; trapping writes to every Array in the system, for example, would not be a terribly good idea.... Instead, I'm using programatically built subclasses that override any potentially mutating methods, and trigger a notification if a mutation actually occurs, and I selectively change the class of the instances I'm interested in (using #primitiveChangeClassTo:) to use these special subclasses.
Anyway, it might be interesting for you to compare this implementation with what you've done with MethodWrappers. I'll give more details when I release the code.
Avi
--- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.772 / Virus Database: 519 - Release Date: 01/10/2004
On Oct 7, 2004, at 3:33 PM, Sebastián Sastre wrote:
About you automatic subclass approach, sounds good. You will make it dependent or independent from GOODS client?
Independent - it's the WriteBarrier package I just released.
squeak-dev@lists.squeakfoundation.org