Hi Chris and all!
I am using Magma now - though I haven't stressed it yet at all, still busy building my domain model. But it seems to "just work" for the moment - which is really nice.
I noticed: http://minnow.cc.gatech.edu/squeak/5832
...and accidentally that is exactly what I need. :) Now, it doesn't necessarily fit me - so I am using a prevaylerish approach with "command" objects that I also save to Magma so that I can reconstruct the model by applying the commands in sequence. This should give me a suitable mechanism to keep Magma database copies (offline clients) in synch with the master and also reasonable hooks to figure out "logical" conflicts.
But I was just curious if you could share your current "thoughts" on the subject.
Also, the security stuff in 1.1 is interesting - we might want to use repository file encryption (or just rely on OS level mechanisms for that - encrypted filesystems).
Was there any other relevant changes with 1.1?
regards, Göran
Hi Göran, what a dream project you have started! The very best projects are the few forging new territory and leading by example by progressive thinkers like yourself and progressive technologies, way to go.
(I like to respond to your earlier note as well, sorry for the delay).
- The new system is intended to support offline
operation, meaning that users will be able to make a standalone installation on their laptops and then replicate a portion of the model to it, work offline, and then sync up later. I will be using a "prevaylerish" pattern to accomplish that, or call it a Command pattern if you like. I also note the existence of Magma's forwarding proxies (might get handy). So yes, the laptops will in that case run a local Magma with a subset of the full db.
This function will be built into Magma. "1.1" has security, "1.2" will have security and "import/export" of large chunks of the persistent model. 1.2 is the very next thing I intend to work on as sooon as the 1.1 and KryptOn are stabilized.
I would like to share a few more thoughts about this function. The idea is that Magma is too centralized. There needs to be a way to accmoplish exactly what you said, for someone to be able to "download" a chunk of the model for their own offline work (i.e., on a plane) and then, later, 'sync-up'.
I also intend for this to serve as the basis for "long transactions."
Now, I want to try to avoid the notion of a "master" and "replicate". Instead, any repository can simply be a conglomeration of objects from many other repositories, and the repository knows from whence each object "originated" to support the sync-up.
If there is a commit-conflict during the sync-up, the committer can only get through that by bringing down the objects in conflict into their own repository, reapplying their updates, and then try to commit again.
Bottom line, you can download your own copy of of the model and that copy is "yours" (you could host it, for example). But the one you copied from is not yours, therefore the burden of commit-conflict reconciliation is always on the committer.
This function, combined with the ForwardingProxy's I hope will be sufficient for collaboration on large-scale domain models in a distributed fashion.
- Regarding wrapping each request in a commit - how
costly is that? The abort/begin is of course needed (and how much does an abort cost if no modifications have been made?), but how much does the commit cost if I say do no modifications but have a large "read set"? I am guessing this is much cheaper if I use WriteBarrier? Is the WriteBarrier code fine to use?
A commit is pretty cheap with a small readSet. With a large readSet, WriteBarrier will definitely improve it dramatically.
WriteBarrier is still supported, but I haven't tested it in a while. WriteBarrier itself also has at least one bug related to changing the class-model while objects are behind a WriteBarrier. Therefore, you should never use WriteBarrier in a development environment where classes will be recompiled.
Still, it is probably good to try to keep the readSet as small as possible.
I noticed: http://minnow.cc.gatech.edu/squeak/5832
....and accidentally that is exactly what I need. :) Now, it doesn't necessarily fit me - so I am using a prevaylerish approach with "command" objects that I also save to Magma so that I can reconstruct the model by applying the commands in sequence. This should give me a suitable mechanism to keep Magma database copies (offline clients) in synch with the master and also reasonable hooks to figure out "logical" conflicts.
As mentioned, I would like this to eventually be supported at the database level.
Also, the security stuff in 1.1 is interesting - we might want to use repository file encryption (or just rely on OS level mechanisms for that
- encrypted filesystems).
Sure, all of the security can be essentially disabled. The choice is yours.
Was there any other relevant changes with 1.1?
Not in terms of major functionality, but there was some simplifying of the ForwardingProxy's implementation. Some other minor refinements too.. 1.1 is definitely the intended "future" of Magma unless there is some huge backlash to the security..
Regards, Chris
Hi Chris and all others!
Chris Muller chris@funkyobjects.org wrote:
Hi Göran, what a dream project you have started! The
Yes, indeed. And it is going great so far. Seaside is really nice - but we all knew that of course. A tad short on class comments (as is Magma <cough>) but it has examples, unfortunately not too many for the new Canvas API - but it is pretty easy to "dig out" how to do things.
And Magma is just churning along so far, really neat. I am using the ConnectionPool thingy that Cees has in Kilauea, have made some tweaks. Still a bit undecided on how to deal with Magma sessions vs Seaside sessions, right now I use the pool and allocate/release on each request - but that is actually rather unpleasant - my Seaside components can't hold onto persistent domain objects that way, ouff!
As I wrote I was toying with the idea of having the Seaside sessions share a single "readonly" Magma session (greatly increasing the scalability) per default and then whenever they need to commit something they would allocate a new session from the pool (the pool having a number of connected sessions), commit, and release the session.
Unfortunately I would then typically be holding persistent objects from the readonly session and wanting to modify them and commit in another session... Is there any facility for dealing with that scenario? In other words, given object x from session A, can I easily obtain object x attached to session B? I assume not given that I would need the full backward chain of objects up to the root.
The alternative is to bite the bullet and simply re-traverse down to the desired object x in session B when I decide I want to send modifying messages to it and commit, but that will make the code uglier.
As a reminder - the reason for my discussion on this topic is that I feel that the "simplistic approach" of simply using a single MagmaSession for each Seaside session doesn't scale that well. I am looking at a possible 100 concurrent users (in the end, not from the start) using an object model with at least say 50000 cases - which of course each consists of a number of other objects. Sure, I can use some kind of multi-image clustering with round-robin Apache in front etc, but still.
Having 100 MagmaSessions all using their own copy of that model sounds tough.
As a sidenote, GemStone has a "shared page cache" so that multiple sessions actually share a cache of objects in ram. Could we possibly contemplate some way of having sessions share a cache? Yes, complex stuff I know. Btw, could you perhaps explain how the caching works today? Do you have some kind of low level cache on the file level for example?
very best projects are the few forging new territory and leading by example by progressive thinkers like yourself and progressive technologies, way to go.
In this project it feels like a prerequisite for success. No chance I will be able to do this in the planned short time period otherwise. Magma is essential, and so is Seaside. And Squeak of course.
(I like to respond to your earlier note as well, sorry for the delay).
- The new system is intended to support offline
operation, meaning that users will be able to make a standalone installation on their laptops and then replicate a portion of the model to it, work offline, and then sync up later. I will be using a "prevaylerish" pattern to accomplish that, or call it a Command pattern if you like. I also note the existence of Magma's forwarding proxies (might get handy). So yes, the laptops will in that case run a local Magma with a subset of the full db.
This function will be built into Magma. "1.1" has security, "1.2" will have security and "import/export" of large chunks of the persistent model. 1.2 is the very next thing I intend to work on as sooon as the 1.1 and KryptOn are stabilized.
Yes, I saw that.
I would like to share a few more thoughts about this function. The idea is that Magma is too centralized. There needs to be a way to accmoplish exactly what you said, for someone to be able to "download" a chunk of the model for their own offline work (i.e., on a plane) and then, later, 'sync-up'.
I also intend for this to serve as the basis for "long transactions."
Now, I want to try to avoid the notion of a "master" and "replicate". Instead, any repository can simply be a conglomeration of objects from many other repositories, and the repository knows from whence each object "originated" to support the sync-up.
This is actually the same idea I want to use in the future architecture of SM. Which perhaps could turn out to be Magma based, who knows.
If there is a commit-conflict during the sync-up, the committer can only get through that by bringing down the objects in conflict into their own repository, reapplying their updates, and then try to commit again.
Bottom line, you can download your own copy of of the model and that copy is "yours" (you could host it, for example). But the one you copied from is not yours, therefore the burden of commit-conflict reconciliation is always on the committer.
This function, combined with the ForwardingProxy's I hope will be sufficient for collaboration on large-scale domain models in a distributed fashion.
Right. In my current scenario I still (for several reasons linked to the requirements) I still will want to "deal" with it using the command pattern. But I think you are on the right track with that focus. The other thing I really would like to have is a damn good free text engine built in, but hey, I will simply have to use something on the side. :)
- Regarding wrapping each request in a commit - how
costly is that? The abort/begin is of course needed (and how much does an abort cost if no modifications have been made?), but how much does the commit cost if I say do no modifications but have a large "read set"? I am guessing this is much cheaper if I use WriteBarrier? Is the WriteBarrier code fine to use?
A commit is pretty cheap with a small readSet. With a large readSet, WriteBarrier will definitely improve it dramatically.
I kinda guessed. Otherwise you keep an original duplicate of all cached objects, right? So WriteBarrier also improves on memory consumption I guess.
WriteBarrier is still supported, but I haven't tested it in a while. WriteBarrier itself also has at least one bug related to changing the class-model while objects are behind a WriteBarrier. Therefore, you should never use WriteBarrier in a development environment where classes will be recompiled.
No problem, as long as I can switch it on for deployment. :)
Still, it is probably good to try to keep the readSet as small as possible.
Well, I find this recommendation slightly odd *in general*. I understand how it makes each transaction faster - but on the other hand you loose the caching benefit. For example, in this app I want a significant part of the model to be cached at all times - the meta model. It will not be large (so I can afford to cache it, even in several sessions), but it will be heavily used so I don't want to end up reading it over and over.
[SNIP]
Sure, all of the security can be essentially disabled. The choice is yours.
Very good. :) And also - do you have any clue on how the performance is affected by using the various security parts?
regards, Göran
Hey Göran, I don't have the context you have into your domain, nor experience with Seaside. Nevertheless, my strong intution suggests we should step back and consider again having one Magma session per Seaside session.
I am not sure whether you are trying to optimize for speed or memory consumption, but I think that this 1:1 approach is good for both.
Still, it is probably good to try to keep the
readSet
as small as possible.
Well, I find this recommendation slightly odd *in general*. I understand how it makes each transaction faster - but on the other hand you loose the caching benefit. For example, in this app I want a significant part of the model to be cached at all times - the meta model. It will not be large (so I can afford to cache it, even in several sessions), but it will be heavily used so I don't want to end up reading it over and over.
It's ok. Go ahead and cache your meta-model in each session if its not so big, but seriously let everything else be read dynamically as-needed. Let every session have only a very small portion of the domain cached and keep it small via #stubOut.
Reads (proxy materializations) are one of the fastest things Magma does. You are supposed to *enjoy* the transparency, not have to worry about such complex ways to circumvent it.
ReadStrategies and #stubOut: are intended to optimize read-performance and memory consumption, respectively. If these are not sufficient, and assuming the uni-session approach (all Seaside sessions share one MagmaSession and one copy of the domain) is not either, *then* these other complex alternatives should be considered. It's not easy for me to say but I have to face the truth; if the intended transparency of Magma cannot be enjoyed then that opens up lots of other options that are equally less-transparent.
As a reminder - the reason for my discussion on this topic is that I feel that the "simplistic approach" of simply using a single MagmaSession for each Seaside session doesn't scale that well. I am looking at a possible 100 concurrent users (in the end, not from the start) using an object model with at least say 50000 cases - which of course each consists of a number of other objects. Sure, I can use some kind of multi-image clustering with round-robin Apache in front etc, but still.
Well, it may scale better than you think. Peak (single-object) read rate is 3149 per second on my slow laptop, 7.15 per second (see http://minnow.cc.gatech.edu/squeak/5606 or run your own MagmaBenchmarker) to read one thousand objects. So if you have 1000 objects in a Case, 100 users all requesting a case at exactly the same time then the longest delay would be ~10 seconds (assuming you're not serving with my slow, circa 2004 laptop). Optimizing the ReadStrategy for a Case would allow better performance.
Any single-image Seaside server where you want to cache a whole bunch of stuff is going to have this sort of scalability issue, no matter what DB is used, right? Remember, you could use the many:1 approach (all Seaside sessions sharing one Magma session and single-copy of the domain), how does this differ from any other solution?.
The 1:1 design, OTOH, is what makes multi-image clustering possible, so from that aspect risk is reduced. That's the one I would try very hard to make work before abandoning TSTTCPW.
As a sidenote, GemStone has a "shared page cache" so that multiple sessions actually share a cache of objects in ram.
That's in the server-side GemStone-Smalltalk image memory though, isn't it? Magma doesn't do that.
Could we possibly contemplate some way of having sessions share a cache? Yes, complex stuff I know. Btw, could you perhaps explain how the caching works today? Do you have some kind of low level cache on the file level for example?
I'm open to ideas. The caching is very simple right now, it just uses WeakIdentityDictionarys to hold read objects.
A commit is pretty cheap with a small readSet.
With a
large readSet, WriteBarrier will definitely
improve it
dramatically.
I kinda guessed. Otherwise you keep an original duplicate of all cached objects, right? So WriteBarrier also improves on memory consumption I guess.
No to the first question, yes to the second (IIRC). It doesn't keep an original "duplicate", just the original buffer that was read.
Very good. :) And also - do you have any clue on how the performance is affected by using the various security parts?
Authorizing every request seems to have imposed about a 10% penalty. #cryptInFiles is hard to measure since writes occur in the background anyway. #cryptOnNetwork definitely slows down network transmissions considerably, only use it if you have to.
Regards, Chris
Hi Chris!
First - thanks for taking time to answer. :)
Chris Muller chris@funkyobjects.org wrote:
Hey Göran, I don't have the context you have into your domain, nor experience with Seaside. Nevertheless, my strong intution suggests we should step back and consider again having one Magma session per Seaside session.
Ok, well, I can probably do that - I just need to be sure that I feel I have "ways out" if it turns bad. Call it "precautionary investigations". Since I am putting myself (and Magma/Seaside/Squeak) on the line here I don't want to fail.
I am not sure whether you are trying to optimize for speed or memory consumption, but I think that this 1:1 approach is good for both.
Not optimizing at the moment - mainly "dabbling" in my head. But both concerns are valid, even though memory consumption was my main worry.
Still, it is probably good to try to keep the
readSet
as small as possible.
Well, I find this recommendation slightly odd *in general*. I understand how it makes each transaction faster - but on the other hand you loose the caching benefit. For example, in this app I want a significant part of the model to be cached at all times - the meta model. It will not be large (so I can afford to cache it, even in several sessions), but it will be heavily used so I don't want to end up reading it over and over.
It's ok. Go ahead and cache your meta-model in each session if its not so big, but seriously let everything else be read dynamically as-needed. Let every session have only a very small portion of the domain cached and keep it small via #stubOut.
Reads (proxy materializations) are one of the fastest things Magma does.
Ok, I assume I might still be avoiding actual file access - given OS file level caching.
You are supposed to *enjoy* the transparency, not have to worry about such complex ways to circumvent it.
I am enjoying it! You may recall I am an old GemStone dog - I know how to enjoy that. :)
ReadStrategies and #stubOut: are intended to optimize read-performance and memory consumption, respectively.
I understand them - the first is similar to GemStone, the second is not - since it is automatic in GemStone, but whatever.
If these are not sufficient, and assuming the uni-session approach (all Seaside sessions share one MagmaSession and one copy of the domain) is not either, *then* these other complex alternatives should be considered. It's not easy for me to say but I have to face the truth; if the intended transparency of Magma cannot be enjoyed then that opens up lots of other options that are equally less-transparent.
Ok. One huge benefit with using 1-1 instead of Cees' ConnectionPool is that my Seaside components can hold onto the persistent objects. Otherwise they can't, because the next request will end up using a different session.
And I really wonder why I haven't realized that until now. ;) Sigh.
As a reminder - the reason for my discussion on this topic is that I feel that the "simplistic approach" of simply using a single MagmaSession for each Seaside session doesn't scale that well. I am looking at a possible 100 concurrent users (in the end, not from the start) using an object model with at least say 50000 cases - which of course each consists of a number of other objects. Sure, I can use some kind of multi-image clustering with round-robin Apache in front etc, but still.
Well, it may scale better than you think. Peak (single-object) read rate is 3149 per second on my slow laptop,
Are we talking cold cache including actual file access? And how does the size of the files on disk affect that?
7.15 per second (see http://minnow.cc.gatech.edu/squeak/5606 or run your own MagmaBenchmarker) to read one thousand objects.
Not sure I grokked that sentence. :)
So if you have 1000 objects in a Case, 100 users all requesting a case at exactly the same time then the longest delay would be ~10 seconds (assuming you're not serving with my slow, circa 2004 laptop).
Mmm.
Optimizing the ReadStrategy for a Case would allow better performance.
That I probably will do when the app settles.
Any single-image Seaside server where you want to cache a whole bunch of stuff is going to have this sort of scalability issue, no matter what DB is used, right? Remember, you could use the many:1 approach (all Seaside sessions sharing one Magma session and single-copy of the domain), how does this differ from any other solution?.
Eh... not sure I follow the logic, but never mind. :)
The 1:1 design, OTOH, is what makes multi-image clustering possible, so from that aspect risk is reduced. That's the one I would try very hard to make work before abandoning TSTTCPW.
Good point.
As a sidenote, GemStone has a "shared page cache" so that multiple sessions actually share a cache of objects in ram.
That's in the server-side GemStone-Smalltalk image memory though, isn't it? Magma doesn't do that.
The "server side" GemStone image can run anywhere - so the closest counterpart in Magma is actually the client image IMHO.
Could we possibly contemplate some way of having sessions share a cache? Yes, complex stuff I know. Btw, could you perhaps explain how the caching works today? Do you have some kind of low level cache on the file level for example?
I'm open to ideas. The caching is very simple right now, it just uses WeakIdentityDictionarys to hold read objects.
And one per session I assume? No cache on any lower level, like on top of the file code?
A commit is pretty cheap with a small readSet.
With a
large readSet, WriteBarrier will definitely
improve it
dramatically.
I kinda guessed. Otherwise you keep an original duplicate of all cached objects, right? So WriteBarrier also improves on memory consumption I guess.
No to the first question, yes to the second (IIRC). It doesn't keep an original "duplicate", just the original buffer that was read.
Ah, ok. But you don't need that when using WriteBarrier right?
Very good. :) And also - do you have any clue on how the performance is affected by using the various security parts?
Authorizing every request seems to have imposed about a 10% penalty. #cryptInFiles is hard to measure since writes occur in the background anyway. #cryptOnNetwork definitely slows down network transmissions considerably, only use it if you have to.
Regards, Chris
regards, Göran
On 1/13/06, goran@krampe.se goran@krampe.se wrote:
Ok. One huge benefit with using 1-1 instead of Cees' ConnectionPool is that my Seaside components can hold onto the persistent objects. Otherwise they can't, because the next request will end up using a different session.
Be glad there's Magma - in OmniBase, you can't even transport your objects from transaction to transaction...
Are we talking cold cache including actual file access? And how does the size of the files on disk affect that?
Hey, run your own benchmarks. They're just as invalid as anyone else's... :-)
One premature optimization you could do is to build a sort of internal service to access base data (hmm... only know the german/dutch word here - 'stammdaten') instead of talk to Magma directly. As long as performance is not an issue, do nothing. As soon as performance becomes an issue, change that interface to use a single separate Magma session (and maybe even a separate Magma image, whatever).
That's probably all the premature optimization I would do at this moment.
How many users will use this app, anyway?
Hi!
Cees De Groot cdegroot@gmail.com wrote:
On 1/13/06, goran@krampe.se goran@krampe.se wrote:
Ok. One huge benefit with using 1-1 instead of Cees' ConnectionPool is that my Seaside components can hold onto the persistent objects. Otherwise they can't, because the next request will end up using a different session.
Be glad there's Magma - in OmniBase, you can't even transport your objects from transaction to transaction...
:)
Are we talking cold cache including actual file access? And how does the size of the files on disk affect that?
Hey, run your own benchmarks. They're just as invalid as anyone else's... :-)
Yeah, well - no time for that right now. I was just trying to pick Chris brain a bit.
One premature optimization you could do is to build a sort of internal service to access base data (hmm... only know the german/dutch word here - 'stammdaten') instead of talk to Magma directly. As long as performance is not an issue, do nothing. As soon as performance becomes an issue, change that interface to use a single separate Magma session (and maybe even a separate Magma image, whatever).
That's probably all the premature optimization I would do at this moment.
Might be an idea. But I am actually not going to do even that until needed. :)
Anyway, I just readapted my Q2Session to use 1-1, but it still uses the pool. At least that will keep the reuse of the cached model when people log in/out. ;)
How many users will use this app, anyway?
350+ and perhaps up to 100 concurrently. I am busy building domain objects and UIs right now, just want to be "prepared" especially if the issue turns up in discussions here.
regards, Göran
Ok, well, I can probably do that - I just need to be sure that I feel I have "ways out" if it turns bad. Call it "precautionary investigations". Since I am putting myself (and Magma/Seaside/Squeak) on the line here I don't want to fail.
Hi, I am finally back from my holiday.
We used Magma + Seaside to demonstrate a configuration management application for 'the biggest mobile operator in the world'.
We simulated a lot of users >50 doing pretty complex things to graphs of tens of thousands of objects. The MagmaCollections meant that only a small subset was materialised at any one time.
We used 1:1 Seaside -> MagmaSession.
The slowest thing was allocating a new MagmaSession as it has to check the image for compatibility. I have some code somewhere which pre-allocates a session - too trivial to sumit.
As for Apache round robins, we had 5 computers reading from a single Magma server on a separate machine (not Seaside) We never managed to saturate the server.
Let me know if you need help. I have primised Chris I will finish Lava this year !
Brent
magma@lists.squeakfoundation.org