I've done a little experiment with the VMMaker repo. I've constructed an archive format for storing all mcz's in a repo storing all unique MCDefinitions once, and deduplicating the MCVersionInfos.
Instead of a package cache with 2725 items taking 6.1 GB I now have an archive that is 123 MB, compressed 70.8. Deduplicating the MCDefinitions alone was responsible for a reduction to 1.2 GB
Stephan
On Wed, Jan 6, 2016 at 4:04 AM, Stephan Eggermont stephan@stack.nl wrote:
constructed an archive format for storing all mcz's in a repo storing all unique MCDefinitions once, and deduplicating the MCVersionInfos.
Instead of a package cache with 2725 items taking 6.1 GB I now have an archive that is 123 MB, compressed 70.8. Deduplicating the MCDefinitions alone was responsible for a reduction to 1.2 GB
Wow. Even in this day of cheap storage, that is impressive. cheers -ben
Hi Stephan,
On Tue, Jan 5, 2016 at 12:04 PM, Stephan Eggermont stephan@stack.nl wrote:
I've done a little experiment with the VMMaker repo. I've constructed an archive format for storing all mcz's in a repo storing all unique MCDefinitions once, and deduplicating the MCVersionInfos.
Instead of a package cache with 2725 items taking 6.1 GB I now have an archive that is 123 MB, compressed 70.8. Deduplicating the MCDefinitions alone was responsible for a reduction to 1.2 GB
This is great news. Thanks! A few questions:
- can you describe where and in what form the MCDefinitions are now stored?
- is the conversion process from one version to another automated yet?
- is there any reason not to update Monticello to use your scheme immediately?
Stephan
_,,,^..^,,,_ best, Eliot
On 06/01/16 01:39, Eliot Miranda wrote:
This is great news. Thanks! A few questions:
- can you describe where and in what form the MCDefinitions are now stored?
The definitions are stored in a set at (squeaksource) project level. Each definition in the MCVersion snapshot is looked up there.
The deduplication of MCVersionInfos was too radical, I've ignored the difficult ancestry reconstruction cases there for now.
- is the conversion process from one version to another automated yet?
No.
- is there any reason not to update Monticello to use your scheme
immediately?
First goal was an archive format for code analysis. For that, it is a nice experiment. Loading and saving a whole archive takes a while, much more than just one mcz.
Stephan
vm-dev@lists.squeakfoundation.org