I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.
But doing only that is loaded with caveats. After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production? Did the script really work? Did it catch all cases or was there possibly a bug in it?
To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion. It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.
The requirements of a data-repair are:
- Enumerate all objects needing repair.
- Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair.
- Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify. If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed.
- (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it.
- Output report of for each of the above actions.
Here is an example of Magma's DataRepair constructor:
(MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199)
enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ])
repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]
To this object, I can send any of the 5 actions: #check, #count, #identify, #improve or #repair. The first three are read-only operations. The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.
Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.
- Chris
_______________________________________________