Hi Chris and Magma people.
Finally I am uploading the data from my old database to magma. I found some errors in my objects, but there were a critical question on the way you commit when using a large collection. If I perform a single commit for each element, it takes too much. With a commit for all the collection an error raises: 'couldnt serialize more than 10 megabytes'. So I had to use a mixed solution, a commit for x elements of the collection. I'm uploading about 150000 objects, and grouped by 500 and it worked fine. The question is: Is there a way to know how space will occupy each object on the serializer, so I can improve this groups of commits to consume the less possible time?
Another question. I was wandering what would happen if I change the class definition of my objects when the objects are alive in the database. I'm making some test to get the answer but I want to know how Magma is expected to work if a) I add an instance variable, b) remove an instance variable, c) change the hierarchy of the class d) change the class of a collaborator, etc.
An finally, we need the method #asOrderedCollection to transform a MagmaCollection. It's not implemented, Object definition is taken. I'm not sure whether magma uses this implementation (Object >> #asOrderedCollection). Can we implement this method as a collect in the normal way?
Thanks in advance. Norberto Manzanos
Hi Norberto!
commit when using a large collection. If I perform a single commit for each element, it takes too much. With a commit for all the collection an error raises: 'couldnt serialize more than 10 megabytes'. So I had to use a mixed solution, a commit for x elements of the collection. I'm uploading about 150000 objects, and grouped by 500 and it worked fine.
Yes, there is some overhead for processing a single request, so you want each request (be it a read or commit) to have some "meat" to it. But if it has too much it will cause other clients to wait or even fail (i.e., if its over 10 meg).
The question is: Is there a way to know how space will occupy each object on the serializer, so I can improve this groups of commits to consume the less possible time?
The best way to figure this out is to use a time measurement rather than a space measurement. But yes, you can figure out the space:
(mySession serializer serializeGraph: myObjects)
print this and you will see something like:
"a MaSerializedGraphBuffer (3597 objects in 40915 bytes)"
myObjects would be the object you want to serialize of course, or, if you want to measure 500 vs. 1000 vs. 1500 just put them all into an Array of that size.
Another question. I was wandering what would happen if I change the class definition of my objects when the objects are alive in the database. I'm making some test to get the answer but I want to know how Magma is expected to work if a) I add an instance variable,
The instance variables of old instances are mapped to the instance-variables in memory by name, so the new instance variable will start out as nil, just like Smalltalk.
Now lets talk about what happens if you add an inst-var in your image, but you have not yet committed the code so, another client (lets call him "Bob") does not have that variable defined in his image yet.
Suppose you populate the new variable on some instances with some object besides nil. THEN Bob reads those instances, since he does not have that variable defined he will not see it. No problem, everything is still fine unless Bob tries commit another change to that object which you have populated the new variable. In that case he will get a warning:
MagmaTruncationWarning: 'Your definition of SomeClass are missing instance variables present in the repository-version. If you proceed with this commit information in one or more of these instances will be lost.'
It would seem nice if it would quietly "remember" the new object you had put there and just keep it, except:
b) remove an instance variable,
data would hang around forever even after you remove variables.
So if a variable is removed, the pointers to the extra variable will remain in the database for each instance until the next time you commit a change to them.
c) change the hierarchy of the class
It should not care about the hierarchy, only about the total set of named instance variable slots. I have not tried it though, so be careful here..
d) change the class of a collaborator,
I'm not sure what this means..
I will note it also pays attention if you rename a class, that Bob will still be able to read/commit the instances under the old name and you can stay on the new name.
Finally, this flexibile nature with the class-model is intended mostly to improve *development* of an application for multiple developers. But there are a lot of combinations and possibilities, so if you intend to have the class-structure changing in a "production" app, be sure to test it well.
Fyi, the test case for all this is #testClassSchemaUpgrades.
An finally, we need the method #asOrderedCollection to transform a MagmaCollection. It's not implemented, Object definition is taken. I'm not sure whether magma uses this implementation (Object >> #asOrderedCollection). Can we implement this method as a collect in the normal way?
#asArray: was provided to create an Array of a subset of the objects. Of course you may implement #asOrderedCollection to do the whole thing, but this begs one question; if you are able to fit the entire MagmaCollection into an OrderedCollection then perhaps an OrderedCollection would be more suitable in the first place..?
You might be interested to know that MagmaCollection>>#where: will be available very soon to do good querying of MagmaCollections..
- Chris
magma@lists.squeakfoundation.org