[Seaside] How do you handle constraints on persisted collections?

Tue May 19 05:49:51 UTC 2009

> I'm trying to make the leap from RDBMS/ORM thinking to pure objects,
> and am running into some problems.  The first is how to handle
> constraints on collections, when the objects inside of that collection
> are mutable.  Quick example, ensuring that only one user exists with a
> certain username.  It's trivial to check that a username is available
> before adding the user to a UserRepository.  If a user can update his
> own username though, it'll bypass any check on the UserRepository and
> invalidate that constraint.  A couple options...
> 
> 1. Have a ChangeUsername command that reserves that username in the
> repo, tells the user object to change its username, and then tells the
> repo to free the original username
> 2. When user username: is called, publish a UsernameChanged event that
> the repository subscribes to.  The repo can choose to reject the
> change if it would violate the repo's constraints.
> 
> The first is just a bit ugly and procedural.  The second requires the
> User object to publish events whenever something happens - which is
> easy to forget to do, or you're left with determining which method
> calls warrant the publishing of events.  Also you'd have to roll back
> the change if the event is rejected.
> 
> You could handle that stuff by writing some framework code to manage
> the tedious stuff...I don't love either approach though.  How do you
> guys tackle this sort of problem?
> 
> Pat

The short answer is you don't.

The longer answer is you're still thinking relationally; thinking in sets.
OODB's and RDBM's are different paradigms and have vastly different strength
and weaknesses.  RDBM's are much better at guaranteeing data integrity
through things such as constraints and keys rather than collections and
pointers, there's a theoretical foundation for this, but OODB's have no such
mathematical rigor.  This is a weakness of the OODB.  There is no one
provable right way to do it.

On the other hand, RDBM's have to force everything into a single fixed
schema data structure to accomplish this.  The problem is this isn't the
data structure most programs actually use.  RDBM's don't speak objects, the
primary method programs actually use to model their domain, and every
attempt to bridge that layer by automatically mapping objects to tables
leads to very serious constraints on the object model.  It becomes infected
with compromises to performance or compromises to the flexibility of the
model in order to make it fit.  Last but not least you're stuck with a whole
new layer of code to maintain, the mapping layer, that wouldn't even be
necessary with an OODB and prior to Rails was often more code than the
domain model itself.  If you really enjoy programming with objects,
relational databases will never quite fit right, there's just an impendence
mismatch that as far as I know no one has ever solved cleanly.

So for a query optimizer, declarative constraints and indexing, you have to
do a whole lot more programming.  A fair trade to some for the rigor they
require, but for those who choose the OODB, they're willing to trade the
rigor for the pragmatism of doing a ton less work and actually getting
things done that just work good enough.  I can't tell you how many RDBM's
I've run across that in production that didn't have a single foreign key in
place or use constraints at all (often for performance reasons) and chug
along just fine making plenty of money for the gold owner.  You're
experienced with Rails, so you're aware that most Rails apps don't use
foreign keys at all.  Rails treats the db like a big persistent hash table
preferring to keep the logic in the model.  This is a very object way of
doing things.  It's not perfect, but it works well enough in practice.

Look at what Rails has to do to try and fake polymorphic collections,
something that is so utterly trivial in an OODBMs that you don't even
realize it's anything special to do.  Toss a bunch of object into a
collection and commit; done.  The flexibility you gain in modeling things
any way you wish using any data structure you want more than makes up for
the things you lose, i.e. simple declarative constraints and a query
optimizer that lets you just slap on indexes after the fact.

With an OODB you have to think more up front and structure you domain in
such a way that your primary use cases are primarily handled by pointer
navigation.  If you're building a blog, you index the posts by the URL in a
dictionary, you put the comments into a collection in the post.  Maybe you
hang the posts off the user who wrote them as well.  When it's all setup,
there's very little if any querying going on.  Now OODB's do support indexed
collections when you need queries but if you're writing a behavioral app
rather than a reporting engine, more often than not you don't want queries
you want tree's of related objects that actually do things.  

If you need to use some special high performance custom written collection
to index your model just the way you want it, an OODB lets you, it's just
another object after all.  If you need collections of objects where every
instance could have wildly different data because you allow the user to add
custom fields, the OODB has no problem with this.  It doesn't care about the
structure of your objects.  The price you pay for that flexibility is you're
on your own, no declarative constraints or query optimizer.  On the other
hand, you're not limited in what you want to do.  A relational db just can't
do user generated fields without converting those fields into rows and
losing its queryability and performance.

Sorry, that turned into a longer post than I'd intended and I'm sure you
already know much of if not all of this, but others who are considering
OODBs might find this useful.  It's a different paradigm and requires a
different style of thinking but pragmatically has a lot of benefits if being
productive is the goal.

Anyway, my approach to your particular issue would be to simply make the
name field immutable after being set the first time.  A simple guard clause
in the accessor that threw an exception if someone tried to set the field if
it already had a value and the new value was different would protect any
unintended aliasing bugs.  After all, an object should encapsulate its own
business rules and if the field were unique I'd have modeled that by hashing
it in a dictionary by that fields value.  It's not the rigorous guarantee
that a constraint on a set is, but objects aren't about sets, they're about
instances and in any real program it's pragmatic enough to get the job done.

Ramon Leon
http://onsmalltalk.com