[squeak-dev] The Inbox: Monticello-mva.667.mcz

Milan Vavra vavra_milan at yahoo.com
Fri Apr 7 11:38:51 UTC 2017


Hi Tobias,

Thanks for asking.

So what is this Monticello-mva.667.mcz good for?

The idea is... 
... let's make mcd files smaller.
As small as they can be.

Only containing information relevant to the diff.

No more. No less.

And they can be. A lot smaller.

An order of magnitude smaller compared with the in-trunk version.
Think 8K instead of 80K for a mcd with one-line changes.

If you have followed the instructions in
http://lists.squeakfoundation.org/pipermail/squeak-dev/2017-April/194029.html
to get a current squeak6.0 alpha, you will have seen files like
Collections-eem.743(ul.742).mcd in your package-cache directory.

If you look at their sizes, you will notice that they are much smaller than
regular mcz files.

For example.
A standard snapshot mcz, Collections-ul.742.mcz is 485K.
A diff mcd, Collections-eem.743(ul.742).mcd is only 84K.

How do you create such files?

Select a version in a Repository Browser, click Diff, select the
version against which the diff should be made and you get a 'diffy version'.
If you now click 'Copy' and copy it to a different directory repository,
an mcd file will be stored there not an mcz file.

Or if you yellow-click a directory repository in Monticello Browser and
select 'store diffs' then whenever you select a version in Repository
browser, click 'Copy' to copy to that directory repository, the version
will be stored there as an mcd file.


But what if I told you that that Collections-eem.743(ul.742).mcd could have
been even smaller. A lot smaller. An order of magnitude smaller.
Not 84K. Only 4.9K.
With no loss of information?

I have that converted version sitting on my disk right now.

It were written out with this modification
http://forum.world.st/The-Inbox-Monticello-mva-667-mcz-tt4941466.html
http://source.squeak.org/inbox/Monticello-mva.667.mcz to Monticello.

How could it be so small?

Well, by trimming the information stored in the 'version' file in the mcd
zip archive.

This information grows over time as new versions are added and commit
comments written. And if not trimmed will gradually take up a significant
portion of the file's size. Especially for small changes. One-liners. And
there's no real need to store it all in each mcd file.


No information is lost. Because the information that is trimmed is
readily available in the monticello version against which the diff mcd was
made.

So the trick is to trim on writing. And attach back from base version on
reading.

Disk space, network bandwidth is saved.

The version's history appears the same as before when all that redundant
information were saved in the mcd archive.

So you can write out much smaller mcd files with this modification.

Read them back in and the system will not know the difference.

And if you have old mcd files sitting around with full version info history
they are read in as before with no surprises.

Does it make sense or which part needs better explanation?

Best Regards,

Milan Vavra





--
View this message in context: http://forum.world.st/The-Inbox-Monticello-mva-667-mcz-tp4941466p4941532.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.


More information about the Squeak-dev mailing list