Trying to understand forwarding proxies

Wed Feb 2 22:58:57 UTC 2011

Hi Norbert, thanks for the _great_ note and questions.  I'm really
glad to see someone finally looking into ForwardingProxies.

On Wed, Feb 2, 2011 at 1:34 PM, Norbert Hartl <norbert at hartl.name> wrote:
> After 4 years of abstinence from Magma I try to get back in touch to evaluate some new ideas. I was thinking how to utilize magma in a cloud environment like the one from amazon. So I'm interested in possible scaling scenarios.

Cool!  I've really wanted to look at Amazon ECC to see if Magma could
run there, just haven't had time to do it.

Please use the latest 1.2alpha.  I am on the cusp of publishing Magma
1.2 right after Squeak 4.2 is all done.

> If I understand it correctly then
>
> - magma uses a directory to write its files. That could be called the repository

Yes.

> - one repository is served by one server at any time

One repository is served by one or more servers at a time.

> - a special mode is possible where the client and the server reside in the same image (thus having only the need for a single image)

Yes, this is called "local" mode and it saves the need to serialize
requests and materialize responses, so this configuration offers the
best single-session performance.

The servers duty is relatively light compared the client-duty, so
running in local mode is not necessarily the best for a
web-application because it can't scale.  Multiple web-sessions
contending one Magma session could easily be worse than multiple
MagmaSessions contending for one server.

> - HA splits one node over certain locations. A node is an arbitrary amount of servers serving a single shared repository

Yes.  With HA, multiple copies of the one single repository are each
hosted by independently running server images.  In this mode, clients
make two connections, one to the "primary" for commits, and one to one
of the secondary's, for reads.

When the primary receives a commit, it is immediately broadcast to all
secondary's, so the persistent model is redundantly safe.

> - forwarding proxies can be made from server to server. So these are cross-domain/cross-repository

Yes, but I would prefer to say, "from repository to repository".  A
MagmaForwardingProxy is just a "bookmark", or a "soft-link" to another
object in another repository.  It only persists the 'location' and
'oid' of the remote object.  It implements #doesNotUnderstand: to look
it up and cache it the first time so that subsequent access is fast.
However, unlike MagmaMutatingProxy's, the ForwardingProxy will never
become: the remote object.  It will always forward through
#doesNotUnderstand:, so if you are sending to a FP in a inner-loop,
send #realObject to the FP to get the cached object for a fast send.

MagmaForwardingProxies are not intended to be transparent to the Magma
developer, they must be used deliberately.  MagmaMutatingProxies are
supposed to be transparent, but there are cases where a #yourself is
needed (e.g., to avoid the proxy being sent as an argument to a
primitive).

The thing to be very aware of about using FP's is that it does tie the
two repositories together.  The app needs them both running to work.
But the separation has performance and organizational advantages.

> Talking to magma is done using a session. So I can do the following.
>
> - talk to a HA node that will read from an arbitrary server but will commit to a single one

Yes.

> - mimicking a domain model by using forwarding proxies. So objects partitioned over multiple repositories appear to be in the same domain

Just to be clear, ForwardingProxies have no relation to HA.  HA is for
replicating one logical repository.  FP's are for linking one logical
repository (which could be hosted HA) to another logical repository
(which could also be hosted HA).  This is the configuration I run for
my own internal app; two repositories, each one HA, so 4 servers total
(but just two physical machines).

> - using multiple sessions to read from multiple servers. That would be the case of domain model partitioning
>
> Up to here I would like to know what is the tradeoff in using forwarding proxies. Is the whole communication done via the proxy on the first machine or is another session created to which the client has direct access? Or to be more precise: If I would have a forwarding proxy to a collection in another repository that would hold objects from a third repository and I would detect: an object from that collection to which server am I talking when invoking a method on that detected object?

A FP only points to one object in one repository.  The FP, itself, is
a persistent object residing in one repository.  In general, the only
requirement for the client app to do this is, whereever it SETS the
object that you want remotely-linked, just send
#asMagmaForwardingProxy to it.

When that is committed, the object it refers to MUST be already
persistent in its own repository so an appropriate location of that
remote object can be determined and persisted with the FP.

Later, when another session comes along and, pretending that FP _is_
the remote object, sends it a message the proxy itself does not
understand.  Trace the code starting at
MagmaForwardingProxy>>#realObject to see that it goes and looks to see
if a session to that is already present and, it so, uses that one.
Otherwise, a new session is established.

> As far as I remember sessions are not the most performant thing to establish. Do I remember that correct and has this changed? Same question goes for the start of a repository. Is this a quick operation or are there a lot of preparation steps that make startup rather slow?

One nice thing about MagmaSessions is that they're persistent with the
image.  Save the image with GUI screens showing persistent objects.
Sessions are connected - even with open transactions - and, when the
image is later restarted, the sessions are reconnected transparently,
the persistent view updated, and that transaction can even then be
committed.

Note that the repositories may continue to have been heavily updated
by other sessions while these sessions were offline.  When the image
restarts, any or all of the thousands of persistent objects in the
image, some being shown on the GUI's, COULD have been updated while
the image was hibernating.

Magma tries to be smart about handling this situation.  First, it
checks which commitNumber the client-session is at vs. where the
server is at.  If it is not a great difference, then the client simply
downloads those few commit-log records from the server and applies
only those updates if they're present in the image.

However, if there were more commits by other sessions since the image
save than there are cached objects in the session, then it would be
faster to refresh all of those objects (even if some of them didn't
change) instead of replaying all of those commit-logs.

So this is the reality that the convenience of resuming the image
state exactly where it left off, there can be a brief pause for it to
bring the objects up to current state.  If sessions do not have a lot
of cached objects, or if the app can open a new session, then that can
be considerably faster.

> I hope these are not too much dumb questions. I'm just thinking about possibilities to what would be worth to try out. With amazon you can have multiple machines attached to a shared block storage.
That can share all the repositories over an arbitrary amount of
machines. But at the moment I can see how to get a lot of benefit from
that particular feature regarding magma. If forwarding proxies are not
too expensive that would be still enable some things.
>
> thanks in advance,
>
> Norbert_______________________________________________
> Magma mailing list
> Magma at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>