[squeak-dev] sustainable Monticello
asqueaker at gmail.com
Tue Mar 8 21:33:19 UTC 2011
This will probably be a long post, but I would like to tell you about
the Monticello upgrades I'm about to move to the trunk.
Monticello has several repository types:
MCRepository #('creationTemplate' 'storeDiffs')
MCDictionaryRepository #('description' 'dict')
MCFileBasedRepository #('cache' 'allFileNames')
MCCacheRepository #('packageCaches' 'seenFiles')
MCFtpRepository #('host' 'directory' 'user' 'password' 'connection')
MCHttpRepository #('location' 'user' 'password' 'readerCache')
MCGOODSRepository #('hostname' 'port' 'connection')
MCSMReleaseRepository #('packageName' 'user' 'password')
but MCFileBasedRepository is the one that has been given all of the
focus, the other repository types have been ignored over the years.
MCHttpRepository is the one that interfaces with SqueakSource, and
MCDirectoryRepository are pretty much the only types being used.
I know this because external users of MCRepository API, like the
Repository-browser tools and MC-Configurations and Installer; these
are all using API's that are specific to MCFileBasedRepository - not
generally understood by the other repository-types or the abstract API
This is worthy of concern because of the access-limitations of a
MCFileBasedRepository. Unlike a MCGOODSRepository, for example, a
file-system-based repository cannot efficiently meet the demands of
being a MCRepository without, at some points, needing to enumerating
ALL version names (files) in its file-system location.
As the number of versions in a repository reaches 1-million and
beyond, performance will grind to a halt due to the number of files
that must be constantly downloaded into RAM (another area of
unscalability and unsustainability related to FileBased Repository's).
A purging of old versions could be done, but a philosophy of
Monticello, from the outset, has been that repository's are intended
to contain "all" of version history.
I have therefore reworked the MCRepository API's and external tools to
talk using only an API that is understood by any repository that
implements the methods identified as #subclassResponsibility in
MCRepository. This minimally-required API is now:
#allPackageNames - answer a list of package names in this repository.
#basicStoreVersion: - add a Version to this repository.
#includesVersionNamed: - does a version with this name exist in this
#versionNamed: - answer the first Version object with the given name.
#versionNamesForPackageNamed: - answer the version names for the
given package name.
#versionWithInfo:ifAbsent: - answer the Version object with the
given unique VersionInfo
In deference to the limitations of FileBasedRepository's, we only ask
for the _names_ of things rather than the whole object, because the
names are all that is needed to satisfy tool requirements, except in
cases where we need a single Version object (like loading). FileBased
cannot access the Version objects quickly, just the (file)names (incl.
author & version-number).
During the process of this refactoring, I was able to signficantly
improve the coherence of the code. It was really, really bad in some
I've also verified the viability of this API by updating
MCMagmaRepository, and demonstrating using Magma as a
totally-sustainable and scalable MC repository. Employing a
Magma-based Repository also affords some additional benefits, which I
will describe in a separate follow-up mail.
I think SqueakSource will eventually have to change to something more
scalable. At least now we have have a viable alternative, and with
much cleaner MC code in the process.
Please load my latest versions of Monticello,
MonticelloConfigurations, Installer and Tests from the Inbox and let
me know if you experience any issues. You should not see any
difference in day-to-day operations.
More information about the Squeak-dev