A little ditty to move toward sustainable ancestry. After selecting "flush cached versions" from the menu, the ancestry-tree will now be like this:
aMCVersionInfo.27 'ancestry' = anArray 1 = aMCVersionInfo.26 'ancestry' = anArray 1 = aMCVersionInfo.25 'ancestry' = anArray 1 = aMCInfoProxy(trimmed 'info', 'repository' to re-retrieve it)
Truncating the ancestry hierarchies this way recovers about 2.5MB of image size.
Special notes:
- It keeps the most-recent 10 and snips off the ancestry starting 10-versions ago to replace it with a MCInfoProxy. Most any operation that uses ancestry will cause the original full MCVersionInfo tree to need to be re-retrieved.
- This assumes the Info of 10 versions ago exists in the same repository as the current version. In practice, it normally will.
- When a new version is saved after recovering the Info tree from ANOTHER FILE. (e.g., the one 10 versions ago) the result is an ancestry tree built from multiple files. But it's the same tree, so this should be no problem.
I had a different idea to solve this issue:
Unroll the ancestry tree to a list. Create a modified MCScanner, which can read a list of versions. A list item would look like the current tree nodes, but the ancestry and stepChildren lists were just references to the actual ancestors/stepChildren in the list. This would enable partial parsing of the ancestry info. A reference could contain the position of the referenced list item in the version list, so we wouldn't have to parse the intermediate elements.
For backwards compatibility this new version list would be stored in a separate file in the .mcz files. This way old versions of MC could still load the package, but newer versions with the new scanner could read them much faster.
Levente
On Wed, 14 Aug 2013, Chris Muller wrote:
A little ditty to move toward sustainable ancestry. After selecting "flush cached versions" from the menu, the ancestry-tree will now be like this:
aMCVersionInfo.27 'ancestry' = anArray 1 = aMCVersionInfo.26 'ancestry' = anArray 1 = aMCVersionInfo.25 'ancestry' = anArray 1 = aMCInfoProxy(trimmed 'info', 'repository' to re-retrieve it)
Truncating the ancestry hierarchies this way recovers about 2.5MB of image size.
Special notes:
- It keeps the most-recent 10 and snips off the ancestry starting
10-versions ago to replace it with a MCInfoProxy. Most any operation that uses ancestry will cause the original full MCVersionInfo tree to need to be re-retrieved.
- This assumes the Info of 10 versions ago exists in the same
repository as the current version. In practice, it normally will.
- When a new version is saved after recovering the Info tree from
ANOTHER FILE. (e.g., the one 10 versions ago) the result is an ancestry tree built from multiple files. But it's the same tree, so this should be no problem.
When you say it would "enable partial parsing of the ancestry info" I didn't quite understand how you achieve that. That's what the new Scanner does?
If you want to put it to Inbox I'm sure it'll make more sense and we can evaluate both approaches.
I like what you seem to be saying about trying to trim it on _load_ so we can always just have a "right-sized" image. It might be a pipe-dream for either implementation though, which is why, for now, to have it as part of the flush-all-caches operation. So the sizes are, "large and fast" or "as small as possible," which fit two use-cases, development and deployment, respectively.
The only way to have something in-between those two is to flush-caches in your dev image and keep developing. After a big release of all of them, only the projects that are worked on enough to invoke their history will be put back in the image. So that's one way to "right-size" between the big and small.
Let me tell you the last step I'd planned for my Proxy implementation. The only ancestry access in MCAncestry is pretty much allAncestorsDo: type stuff that ends up traversing the whole tree. We could instead have a Preference of some kind (pragma-based, of course), which defines the size of what should be considered "recentHistory". Like, something between 10 and 100.
MCWorkingCopy>>#stubAncestry would be updated to stub everything older than the preference setting.
Finally, all the operations which today are using allAncestorsDo: would change to use "recentAncestorsDo:" so that the Proxy would never be hit. The preference could be adjusted to balance between development and deployment interests.
Whether this more complex "sizing" capability would stop me from just doing a flush-all-caches before deployment.. or care during development.. probably not. So that's is why I wonder whether attempting this is useful...
On Wed, Aug 14, 2013 at 3:33 PM, Levente Uzonyi leves@elte.hu wrote:
I had a different idea to solve this issue:
Unroll the ancestry tree to a list. Create a modified MCScanner, which can read a list of versions. A list item would look like the current tree nodes, but the ancestry and stepChildren lists were just references to the actual ancestors/stepChildren in the list. This would enable partial parsing of the ancestry info. A reference could contain the position of the referenced list item in the version list, so we wouldn't have to parse the intermediate elements.
For backwards compatibility this new version list would be stored in a separate file in the .mcz files. This way old versions of MC could still load the package, but newer versions with the new scanner could read them much faster.
Levente
On Wed, 14 Aug 2013, Chris Muller wrote:
A little ditty to move toward sustainable ancestry. After selecting
"flush cached versions" from the menu, the ancestry-tree will now be like this:
aMCVersionInfo.27 'ancestry' = anArray 1 = aMCVersionInfo.26 'ancestry' = anArray 1 = aMCVersionInfo.25 'ancestry' = anArray 1 = aMCInfoProxy(trimmed 'info', 'repository' to re-retrieve it)
Truncating the ancestry hierarchies this way recovers about 2.5MB of image size.
Special notes:
- It keeps the most-recent 10 and snips off the ancestry starting
10-versions ago to replace it with a MCInfoProxy. Most any operation that uses ancestry will cause the original full MCVersionInfo tree to need to be re-retrieved.
- This assumes the Info of 10 versions ago exists in the same
repository as the current version. In practice, it normally will.
- When a new version is saved after recovering the Info tree from
ANOTHER FILE. (e.g., the one 10 versions ago) the result is an ancestry tree built from multiple files. But it's the same tree, so this should be no problem.
On 14 August 2013 19:57, Chris Muller asqueaker@gmail.com wrote:
A little ditty to move toward sustainable ancestry. After selecting "flush cached versions" from the menu, the ancestry-tree will now be like this:
aMCVersionInfo.27 'ancestry' = anArray 1 = aMCVersionInfo.26 'ancestry' = anArray 1 = aMCVersionInfo.25 'ancestry' = anArray 1 = aMCInfoProxy(trimmed 'info', 'repository' to re-retrieve it)
Truncating the ancestry hierarchies this way recovers about 2.5MB of image size.
Special notes:
- It keeps the most-recent 10 and snips off the ancestry starting
10-versions ago to replace it with a MCInfoProxy. Most any operation that uses ancestry will cause the original full MCVersionInfo tree to need to be re-retrieved.
- This assumes the Info of 10 versions ago exists in the same
repository as the current version. In practice, it normally will.
- When a new version is saved after recovering the Info tree from
ANOTHER FILE. (e.g., the one 10 versions ago) the result is an ancestry tree built from multiple files. But it's the same tree, so this should be no problem.
I'm sure you're already thinking about this, but we need to be careful to maintain compatibility with old MCZs. In particular, we need to be able to consume MCZs with full histories, and have images without this enhancement still work with the new style MCZs. It might well be worth asking some Pharo folks what they think, and maybe coordinate a bit across dialects.
frank
On 8/15/13, Frank Shearar frank.shearar@gmail.com wrote:
On 14 August 2013 19:57, Chris Muller asqueaker@gmail.com wrote:
A little ditty to move toward sustainable ancestry. After selecting "flush cached versions" from the menu, the ancestry-tree will now be like this:
aMCVersionInfo.27 'ancestry' = anArray 1 = aMCVersionInfo.26 'ancestry' = anArray 1 = aMCVersionInfo.25 'ancestry' = anArray 1 = aMCInfoProxy(trimmed 'info', 'repository' to re-retrieve it)
Truncating the ancestry hierarchies this way recovers about 2.5MB of image size.
Special notes:
- It keeps the most-recent 10 and snips off the ancestry starting
10-versions ago to replace it with a MCInfoProxy. Most any operation that uses ancestry will cause the original full MCVersionInfo tree to need to be re-retrieved.
- This assumes the Info of 10 versions ago exists in the same
repository as the current version. In practice, it normally will.
- When a new version is saved after recovering the Info tree from
ANOTHER FILE. (e.g., the one 10 versions ago) the result is an ancestry tree built from multiple files. But it's the same tree, so this should be no problem.
I'm sure you're already thinking about this, but we need to be careful to maintain compatibility with old MCZs. In particular, we need to be able to consume MCZs with full histories, and have images without this enhancement still work with the new style MCZs.
+1
It might well be worth asking some Pharo folks what they think, and maybe coordinate a bit across dialects.
frank
On Thu, Aug 15, 2013 at 2:46 AM, Frank Shearar frank.shearar@gmail.com wrote:
On 14 August 2013 19:57, Chris Muller asqueaker@gmail.com wrote:
A little ditty to move toward sustainable ancestry. After selecting "flush cached versions" from the menu, the ancestry-tree will now be like this:
aMCVersionInfo.27 'ancestry' = anArray 1 = aMCVersionInfo.26 'ancestry' = anArray 1 = aMCVersionInfo.25 'ancestry' = anArray 1 = aMCInfoProxy(trimmed 'info', 'repository' to re-retrieve it)
Truncating the ancestry hierarchies this way recovers about 2.5MB of image size.
Special notes:
- It keeps the most-recent 10 and snips off the ancestry starting
10-versions ago to replace it with a MCInfoProxy. Most any operation that uses ancestry will cause the original full MCVersionInfo tree to need to be re-retrieved.
- This assumes the Info of 10 versions ago exists in the same
repository as the current version. In practice, it normally will.
- When a new version is saved after recovering the Info tree from
ANOTHER FILE. (e.g., the one 10 versions ago) the result is an ancestry tree built from multiple files. But it's the same tree, so this should be no problem.
I'm sure you're already thinking about this, but we need to be careful to maintain compatibility with old MCZs. In particular, we need to be able to consume MCZs with full histories, and have images without this enhancement still work with the new style MCZs. It might well be worth asking some Pharo folks what they think, and maybe coordinate a bit across dialects.
The persistent state of the mcz files is unchanged. Only the in-memory state is slimmer.
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's trying to look for that. Not all merged versions are in trunk, by design.
The more I think about it, the less convinced I am is that this space optimization is worth introducing such a fragile machinery. MC is designed to have all ancestry info available at all times - just opening any repository will cause the proxies to materialize again, because the highlighting looks at which version names are in the ancestry of the working copy.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
- Bert -
On 2013-08-15, at 18:26, Chris Muller asqueaker@gmail.com wrote:
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
On Thu, Aug 15, 2013 at 11:39 AM, Bert Freudenberg bert@freudenbergs.dewrote:
It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's trying to look for that. Not all merged versions are in trunk, by design.
The original MC documentation says ALL versions are intended to be contained by repositories. I have no idea what "design" you're talking about.
The more I think about it, the less convinced I am is that this space optimization is worth introducing such a fragile
You uncovered one issue and you're calling it "fragile?" Proxy is a well-documented, proven pattern that has stood the test of time.
machinery. MC is designed to have all ancestry info available at all times
- just opening any repository will cause the proxies to materialize again,
because the highlighting looks at which version names are in the ancestry of the working copy.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
It's not just about smaller images. It's about sustainability of the ancestry.
- Bert -
On 2013-08-15, at 18:26, Chris Muller asqueaker@gmail.com wrote:
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de
wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find
a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the
background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session,
recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
On Thu, Aug 15, 2013 at 9:48 AM, Chris Muller ma.chris.m@gmail.com wrote:
On Thu, Aug 15, 2013 at 11:39 AM, Bert Freudenberg bert@freudenbergs.dewrote:
It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's trying to look for that. Not all merged versions are in trunk, by design.
The original MC documentation says ALL versions are intended to be contained by repositories. I have no idea what "design" you're talking about.
so, all versions are in a repository somewhere. Maybe in SqueakSource (now read only?), SmalltalkHub, GitHub, SS3, or even on my local file directory repository. Not all of those are shared - or constantly attached to the image you are working with. The machinery needs to work with missing versions - which Monticello does - it just goes farther back until it can find a common ancestor.
I'm probably missing something, but couldn't the proxy resolve itself with the data contained in the .mcz should it need to? The data is there in the first place, and should be easier to get at than web calls to repositories (or directory scans, for that matter).
-Chris
On 15.08.2013, at 18:48, Chris Muller ma.chris.m@gmail.com wrote:
On Thu, Aug 15, 2013 at 11:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's trying to look for that. Not all merged versions are in trunk, by design.
The original MC documentation says ALL versions are intended to be contained by repositories. I have no idea what "design" you're talking about.
The idea is that versions are self-contained. When I merge two versions, I only need to share and upload the result. That means that you will not find that other version in your repo. But the merged version has all ancestry data in it (you know that).
The more I think about it, the less convinced I am is that this space optimization is worth introducing such a fragile
You uncovered one issue and you're calling it "fragile?" Proxy is a well-documented, proven pattern that has stood the test of time.
I'm not aware of any current use of proxies in Squeak trunk.
machinery. MC is designed to have all ancestry info available at all times - just opening any repository will cause the proxies to materialize again, because the highlighting looks at which version names are in the ancestry of the working copy.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
It's not just about smaller images. It's about sustainability of the ancestry.
But as soon as you use MC it needs the ancestry anyway.
- Bert -
- Bert -
On 2013-08-15, at 18:26, Chris Muller asqueaker@gmail.com wrote:
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
[>> It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's
trying to look for that. Not all merged versions are in trunk, by design.
The original MC documentation says ALL versions are intended to be contained by repositories. I have no idea what "design" you're talking about.
The idea is that versions are self-contained. When I merge two versions, I only need to share and upload the result. That means that you will not find that other version in your repo. But the merged version has all ancestry data in it (you know that).
Did you notice that I uploaded ALL interim versions of Monticello-cmm.[552-557]? Why would I do that when technically I only needed to upload 557?
Because MC functions depend on the ancestry model matching what's in the repositories. Keeping all versions supports incremental development and rollback. Besides that we should just maintain an MC model that is "whole" and operational rather than broken. Are you concerned about disk space?
I'm not aware of any current use of proxies in Squeak trunk.
Great then it's high time that Squeak has a working example of this well-known design pattern in the image.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
It's not just about smaller images. It's about sustainability of the ancestry.
But as soon as you use MC it needs the ancestry anyway.
Not all of it. We're up to version 600+ of Morphic, when was the last time version 1 of Morphic was needed? But we continue to carry that around, in and out of the system, forever. It's a gradual decline, unsustainable.
Levente and I are interested in addressing this.
(From the other note)
Well, I didn't complain right away ;)
< 24 hours dude. ;)
It sounded like a neat idea at first, but then I remembered that one of the things I really like about Monticello is its clarity and simplicity. Everything is very concrete, whereas proxies are very meta by nature.
We haven't lost clarity or simplicity. That's the nice thing about this solution, it changes _nothing_ about the MC model. It's very transient, all-in-memory. There's no disaster scenario.
I've use much more complicated MagmaProxies, millions of them, everyday for years. They work. The issue you experienced was predicted in my "Special Notes". Please don't surrender yet.
On 2013-08-15, at 21:24, Chris Muller asqueaker@gmail.com wrote:
[>> It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's
trying to look for that. Not all merged versions are in trunk, by design.
The original MC documentation says ALL versions are intended to be contained by repositories. I have no idea what "design" you're talking about.
The idea is that versions are self-contained. When I merge two versions, I only need to share and upload the result. That means that you will not find that other version in your repo. But the merged version has all ancestry data in it (you know that).
Did you notice that I uploaded ALL interim versions of Monticello-cmm.[552-557]? Why would I do that when technically I only needed to upload 557?
We try to have a continuous "trunk" of versions in the trunk repository. We named it that way, even. But we do not store copies of all branches, because MC doesn't need them, and we don't need them. So versions that got merged into trunk do not need to be in trunk themselves, and for sure not their ancestors.
Because MC functions depend on the ancestry model matching what's in the repositories. Keeping all versions supports incremental development and rollback. Besides that we should just maintain an MC model that is "whole" and operational rather than broken. Are you concerned about disk space?
No, I am concerned about putting even more restrictions onto Monticello. We have gradually moved from a system with very few assumptions, over a period of non-enforced conventions, to a rigidly enforced one. Version names are an example of that. And now your adding a requirement to have an internet connection all the time because MC can unpredictably request an ancient version. I do not see that as a good idea.
I'm not aware of any current use of proxies in Squeak trunk.
Great then it's high time that Squeak has a working example of this well-known design pattern in the image.
Perhaps you can find a better example. And if not, then maybe it's not as essential as you think. Just because we *can* do something does not mean we *have* to.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
It's not just about smaller images. It's about sustainability of the ancestry.
But as soon as you use MC it needs the ancestry anyway.
Not all of it. We're up to version 600+ of Morphic, when was the last time version 1 of Morphic was needed? But we continue to carry that around, in and out of the system, forever.
It does not need to load these old versions, but it often needs to their version names, and sometimes the UUID, and having the commit message is useful too at times.
You're not doing anything about that need. You're just hiding it out of sight. That's not a solution.
It's a gradual decline, unsustainable. Levente and I are interested in addressing this.
A noble goal, and I agree we need to work on it, but you're not addressing it.
(From the other note)
Well, I didn't complain right away ;)
< 24 hours dude. ;)
I'm quick ;) I did test it, and thought about it.
It sounded like a neat idea at first, but then I remembered that one of the things I really like about Monticello is its clarity and simplicity. Everything is very concrete, whereas proxies are very meta by nature.
We haven't lost clarity or simplicity. That's the nice thing about this solution, it changes _nothing_ about the MC model. It's very transient, all-in-memory. There's no disaster scenario.
Wrong. Now just about anything you do can cause a file read or network access because MC is trying to materialize a proxy that shouldn't have been stubbed out in the first place. Before, each working copy could access its full ancestry data. That is a very serious change of behavior, in my book.
I've use much more complicated MagmaProxies, millions of them, everyday for years. They work. The issue you experienced was predicted in my "Special Notes". Please don't surrender yet.
Because in Magma there is a real need for proxies. In MC, there isn't.
- Bert -
Did you notice that I uploaded ALL interim versions of Monticello-cmm.[552-557]? Why would I do that when technically I only needed to upload 557?
We try to have a continuous "trunk" of versions in the trunk repository. We named it that way, even. But we do not store copies of all branches, because MC doesn't need them, and we don't need them. So versions that got merged into trunk do not need to be in trunk themselves, and for sure not their ancestors.
Bottom line -- if you want to find the diffs between two old versions in the ancestry, you'll need them both. For you to assert "for sure not their ancestors" is wrong -- you CAN'T be sure. No one knows what might be needed in the future.
Because MC functions depend on the ancestry model matching what's in the repositories. Keeping all versions supports incremental development and rollback. Besides that we should just maintain an MC model that is "whole" and operational rather than broken. Are you concerned about disk space?
No, I am concerned about putting even more restrictions onto Monticello. We have gradually moved from a system with very few assumptions, over a period of non-enforced conventions, to a rigidly enforced one. Version names are an example of that. And now your adding a requirement to have an internet connection all the time because MC can unpredictably request an ancient version. I do not see that as a good idea.
You know what was restrictive about the version names before? It was that they were dumb Strings being treated as a multi-field object, from 10 different places in the code, all similar but slightly different, and none commented. It caused paralysis because changes could not be made safely. It's why it took weeks for me to dissect and do the surgery necessary to reify that crap.
Did you know, Bert, that before I did that work, we were "restricted" to use only FileBasedRepository's. Now we we have a unified API between all repository types.
Or, we DID, until recently when you and Eliot slapped that branch-name in it. At least it's no longer hidden like it was before MCVersionName, but MC has no notion of branches anywhere in its domain. Guess what? Projects using your feature are now stuck back on only FileBasedRepositories once again.
But as soon as you use MC it needs the ancestry anyway.
Not all of it. We're up to version 600+ of Morphic, when was the last time version 1 of Morphic was needed? But we continue to carry that around, in and out of the system, forever.
It does not need to load these old versions, but it often needs to their version names, and sometimes the UUID, and having the commit message is useful too at times.
Dodge. Please explain the use-case where Morphic.1 would need to be consumed by a human or the system.
You're not doing anything about that need. You're just hiding it out of sight. That's not a solution.
What need? Hiding what? Huh?
It's a gradual decline, unsustainable. Levente and I are interested in addressing this.
A noble goal, and I agree we need to work on it, but you're not addressing it.
You obviously didn't read my note to Levente in this thread which explained the next-step I want to take with this.
We haven't lost clarity or simplicity. That's the nice thing about this solution, it changes _nothing_ about the MC model. It's very transient, all-in-memory. There's no disaster scenario.
Wrong. Now just about anything you do can cause a file read or network access because MC is trying to materialize a proxy that shouldn't have been stubbed out in the first place. Before, each working copy could access its full ancestry data. That is a very serious change of behavior, in my book.
Look, I'm glad you at least agree it's a noble _goal_. So please give us a solution, won't you? Please share your wildest imagination about how it would be possible to achieve this goal without needing to be connected to a repository?
Levente has an alternate solution that does not employ proxies. I personally like the Proxy solution because it's just a simple "one off" solution that makes no changes to the MC model. But realizing the goal is more important to me than using Proxies. Perhaps, Bert, you would approve of Levente's solution or propose one yourself.
Until then, I'll make the purging of ancestry a separate menu item, so you don't have to select it and you can stay happy.
On 2013-08-16, at 17:00, Chris Muller asqueaker@gmail.com wrote:
Did you notice that I uploaded ALL interim versions of Monticello-cmm.[552-557]? Why would I do that when technically I only needed to upload 557?
We try to have a continuous "trunk" of versions in the trunk repository. We named it that way, even. But we do not store copies of all branches, because MC doesn't need them, and we don't need them. So versions that got merged into trunk do not need to be in trunk themselves, and for sure not their ancestors.
Bottom line -- if you want to find the diffs between two old versions in the ancestry, you'll need them both. For you to assert "for sure not their ancestors" is wrong -- you CAN'T be sure. No one knows what might be needed in the future.
We only store trunk versions in trunk, not the non-trunk ancestors of merged versions. Seems reasonable to me.
Because MC functions depend on the ancestry model matching what's in the repositories. Keeping all versions supports incremental development and rollback. Besides that we should just maintain an MC model that is "whole" and operational rather than broken. Are you concerned about disk space?
No, I am concerned about putting even more restrictions onto Monticello. We have gradually moved from a system with very few assumptions, over a period of non-enforced conventions, to a rigidly enforced one. Version names are an example of that. And now your adding a requirement to have an internet connection all the time because MC can unpredictably request an ancient version. I do not see that as a good idea.
You know what was restrictive about the version names before? It was that they were dumb Strings being treated as a multi-field object, from 10 different places in the code, all similar but slightly different, and none commented. It caused paralysis because changes could not be made safely. It's why it took weeks for me to dissect and do the surgery necessary to reify that crap.
Actually the MC code base is very careful to not assign any meaning to a version name. Only the UI would try to parse it to present multiple versions in a useful way to the user. A version name *is* just a string, nothing more. Everything meaningful in MC had its own class, but version names were just that, dumb labels, intentionally. Now that you have "reified that crap" people tend to misuse it for all sorts of things.
Did you know, Bert, that before I did that work, we were "restricted" to use only FileBasedRepository's. Now we we have a unified API between all repository types.
I did not know that. But I also don't think MCVersionName would have been necessary to achieve that goal, because, again, it's supposed to be strictly a UI thing.
Or, we DID, until recently when you and Eliot slapped that branch-name in it. At least it's no longer hidden like it was before MCVersionName, but MC has no notion of branches anywhere in its domain. Guess what? Projects using your feature are now stuck back on only FileBasedRepositories once again.
Branches have been supported by file naming conventions since the inception of Monticello. Ask Colin.
But as soon as you use MC it needs the ancestry anyway.
Not all of it. We're up to version 600+ of Morphic, when was the last time version 1 of Morphic was needed? But we continue to carry that around, in and out of the system, forever.
It does not need to load these old versions, but it often needs to their version names, and sometimes the UUID, and having the commit message is useful too at times.
Dodge. Please explain the use-case where Morphic.1 would need to be consumed by a human or the system.
Select Morphic in the MC browser. Open the trunk repo. Done.
Actually, I couldn't try it because even in a fully updated image I get a proxy error doing just that. To make sure it's not just my image I did the same using a trunk image from the build server. Same error.
(using MC-cmm.560 in both cases)
You're not doing anything about that need. You're just hiding it out of sight. That's not a solution.
What need? Hiding what? Huh?
I thought the actual issue was that accessing the trunk repo feels slow. Okay, you're not hiding that, I take it back. (I had a mental image of hiding problems behind a proxy, but it's not that easy to verbalize).
It's a gradual decline, unsustainable. Levente and I are interested in addressing this.
A noble goal, and I agree we need to work on it, but you're not addressing it.
You obviously didn't read my note to Levente in this thread which explained the next-step I want to take with this.
I only saw you proposing to reduce the need for materializing your proxies by ignoring older meta data. Which has nothing to do with the actual issues, cf above.
We haven't lost clarity or simplicity. That's the nice thing about this solution, it changes _nothing_ about the MC model. It's very transient, all-in-memory. There's no disaster scenario.
Wrong. Now just about anything you do can cause a file read or network access because MC is trying to materialize a proxy that shouldn't have been stubbed out in the first place. Before, each working copy could access its full ancestry data. That is a very serious change of behavior, in my book.
Look, I'm glad you at least agree it's a noble _goal_. So please give us a solution, won't you? Please share your wildest imagination about how it would be possible to achieve this goal without needing to be connected to a repository?
I don't have a solution for that, but then I also don't see the ancestry data in the image as a big problem. We could talk about inefficiencies with the squeaksource server, but that would be a different topic.
Levente has an alternate solution that does not employ proxies. I personally like the Proxy solution because it's just a simple "one off" solution that makes no changes to the MC model. But realizing the goal is more important to me than using Proxies. Perhaps, Bert, you would approve of Levente's solution or propose one yourself.
Levente's idea was very different. He did not imply to purge anything from memory which would have to be separately loaded on demand. He is looking for a more efficient way to store the ancestry data.
Until then, I'll make the purging of ancestry a separate menu item, so you don't have to select it and you can stay happy.
That's a good idea, to avoid running into the problem by accident. Or perhaps a preference, then you wouldn't even need a menu entry. Also useful would be a menu item (or do-it) that would restore the full meta data without going through the proxy machinery (which also could get triggered when you turn off the preference).
- Bert -
Bottom line -- if you want to find the diffs between two old versions in the ancestry, you'll need them both. For you to assert "for sure not their ancestors" is wrong -- you CAN'T be sure. No one knows what might be needed in the future.
We only store trunk versions in trunk, not the non-trunk ancestors of merged versions. Seems reasonable to me.
I always considered something merged into ancestry tree, then it's part of trunk. It sounds like what you're saying is that diffing with previous versions existing within the trunk repository is good enough. However, that still means that diffing from the Ancestry list could result in a debugger.
You know what was restrictive about the version names before? It was that they were dumb Strings being treated as a multi-field object, from 10 different places in the code, all similar but slightly different, and none commented. It caused paralysis because changes could not be made safely. It's why it took weeks for me to dissect and do the surgery necessary to reify that crap.
Actually the MC code base is very careful to not assign any meaning to a version name. Only the UI would try to parse it to present multiple versions in a useful way to the user. A version name *is* just a string, nothing more. Everything meaningful in MC had its own class, but version names were just that, dumb labels, intentionally. Now that you have "reified that crap" people tend to misuse it for all sorts of things.
To say, the UI would "try to parse" in a context of being "useful", in itself gives away that there are desired behaviors here based on structure in a version-name. Other parsing had found its way into our systems, scattered about, to accomodate our SCM processes long before I reified it. I don't know how willy-nilly-naming would ever be helpful for anything, but this is OT.
But as soon as you use MC it needs the ancestry anyway.
Not all of it. We're up to version 600+ of Morphic, when was the last time version 1 of Morphic was needed? But we continue to carry that around, in and out of the system, forever.
It does not need to load these old versions, but it often needs to their version names, and sometimes the UUID, and having the commit message is useful too at times.
Dodge. Please explain the use-case where Morphic.1 would need to be consumed by a human or the system.
Select Morphic in the MC browser. Open the trunk repo. Done.
I said *need* to be consumed, not consumed. Right now, the system consumes it needlessly. What I have so far doesn't change that except on a per-package basis for now.
Actually, I couldn't try it because even in a fully updated image I get a proxy error doing just that. To make sure it's not just my image I did the same using a trunk image from the build server. Same error.
(using MC-cmm.560 in both cases)
I guess it's because you had merged versions in your image. That's fixed that in MC.561.
You obviously didn't read my note to Levente in this thread which explained the next-step I want to take with this.
I only saw you proposing to reduce the need for materializing your proxies by ignoring older meta data. Which has nothing to do with the actual issues, cf above.
Ignoring Morphic.1 seems like a safe thing to do. By setting the history-size preference to 999999 one could easily regain access to Morphic.1.
And activating it via purge (vs. load) is an approach that matches up against the use-cases very well.
Look, I'm glad you at least agree it's a noble _goal_. So please give us a solution, won't you? Please share your wildest imagination about how it would be possible to achieve this goal without needing to be connected to a repository?
I don't have a solution for that, but then I also don't see the ancestry data in the image as a big problem. We could talk about inefficiencies with the squeaksource server, but that would be a different topic.
Not a big problem, but a problem. And a "noble" goal. :)
That's a good idea, to avoid running into the problem by accident. Or perhaps a preference, then you wouldn't even need a menu entry. Also useful would be a menu item (or do-it) that would restore the full meta data without going through the proxy machinery (which also could get triggered when you turn off the preference).
I chose a separate menu option in the interests of simplicity.
You know, you guys are really tough! For many years y'all have been complaining about the size of the image. When I complained to Frank about committing new versions of stuff with one character deleted from a comment, he said, "if our systems can't withstand lots of updates then we need to fix our systems."
That's what I've spent my time and energy working toward here. I thought you would be pleased, but you don't want to even give it a chance..
On Thu, Aug 15, 2013 at 11:39 AM, Bert Freudenberg bert@freudenbergs.dewrote:
It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's trying to look for that. Not all merged versions are in trunk, by design.
The more I think about it, the less convinced I am is that this space optimization is worth introducing such a fragile machinery. MC is designed to have all ancestry info available at all times - just opening any repository will cause the proxies to materialize again, because the highlighting looks at which version names are in the ancestry of the working copy.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
- Bert -
On 2013-08-15, at 18:26, Chris Muller asqueaker@gmail.com wrote:
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de
wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find
a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the
background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session,
recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
On 15 August 2013 17:51, Chris Muller ma.chris.m@gmail.com wrote:
You know, you guys are really tough! For many years y'all have been complaining about the size of the image. When I complained to Frank about committing new versions of stuff with one character deleted from a comment, he said, "if our systems can't withstand lots of updates then we need to fix our systems."
That's what I've spent my time and energy working toward here. I thought you would be pleased, but you don't want to even give it a chance..
If it's any consolation, I've felt like that from time to time, here :)
frank
On 15.08.2013, at 18:51, Chris Muller ma.chris.m@gmail.com wrote:
You know, you guys are really tough! For many years y'all have been complaining about the size of the image.
IMHO making the image more modular is not about size in the first place, but about managing complexity. Having clear dependencies between packages also is useful for image shrinking, granted, but much more importantly it makes the system cleaner and simpler to understand.
When I complained to Frank about committing new versions of stuff with one character deleted from a comment, he said, "if our systems can't withstand lots of updates then we need to fix our systems."
That was about the way we store MCZs, each of which has a full snapshot of the code. I have not heard complaints about the size of MCVersionInfos in the image.
That's what I've spent my time and energy working toward here. I thought you would be pleased, but you don't want to even give it a chance.
Well, I didn't complain right away ;) It sounded like a neat idea at first, but then I remembered that one of the things I really like about Monticello is its clarity and simplicity. Everything is very concrete, whereas proxies are very meta by nature.
- Bert -
On Thu, Aug 15, 2013 at 11:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It happens to not find "XML-Parser-Alexandre_Bergel.20". No idea why it's trying to look for that. Not all merged versions are in trunk, by design.
The more I think about it, the less convinced I am is that this space optimization is worth introducing such a fragile machinery. MC is designed to have all ancestry info available at all times - just opening any repository will cause the proxies to materialize again, because the highlighting looks at which version names are in the ancestry of the working copy.
I'd rather revert this whole thing, to be honest. If you're trying to build a minimal image for deploying an application you would be better off unloading MC altogether.
- Bert -
On 2013-08-15, at 18:26, Chris Muller asqueaker@gmail.com wrote:
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
On Thu, 15 Aug 2013, Chris Muller wrote:
Ok, I'll look at it today. One thing is that all ancestry SHOULD be in the same repository -- but I agree, the system needs to handle that as gracefully as possible if it isn't.
Is it a requirement from now on that all the ancestry has to be present (in a single repository) if I want to use MC? Or did I misunderstand something?
Levente
Can you tell me how to reproduce the issue?
Thanks.
On Thu, Aug 15, 2013 at 10:39 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
On 2013-08-15, at 17:39, Bert Freudenberg bert@freudenbergs.de wrote:
It's not quite bullet-proof yet: log of DNU attached. It fails to find a certain info, which causes all kinds of problems.
Also, something apparently tries to materialize infos in the background. Possibly updating MC browsers, not sure. This leads to very strange and hard to get-rid-of notifiers:
<PastedGraphic-2.png>
(these updating bars used to be very rare, like once per session, recently they pop up multiple times for many operations, but I've never before had 2 on the screen at the same time)
- Bert -
<SqueakDebug.log>
I tried to rescue my image that got broken by choosing "flush cached versions". Manually downloaded Monticello-cmm.560.mcz, tried to merge in using a file list. Didn't help, because even merging needs access to the infos. And it's a pain in the neck to debug because just opening a debugger tries to materialize the proxy again which results in an error again.
This is what I mean by "fragile" and "unneeded complexity".
- Bert -
I tried to rescue my image that got broken by choosing "flush cached versions". Manually downloaded Monticello-cmm.560.mcz, tried to merge in using a file list. Didn't help, because even merging needs access to the infos. And it's a pain in the neck to debug because just opening a debugger tries to materialize the proxy again which results in an error again.
This is what I mean by "fragile" and "unneeded complexity".
Ok. Since I've become comfortable debugging proxy issues in Magma for so long, I had trouble understanding this at first. I understand your feelings now.
As I said yesterday, the issue you encountered was not only easy to identify from your SqueakDebug.log, it was the issue predicted could happen in my "Special Notes". Although I think all ancestry should be there, the reality is it didn't take along at all for you to find a case where it wasn't -- and so it MUST handle that.
The improvement you suggested yesterday eliminates the expectation that an older MCVersion need be in the repository. It was a great idea, I think it took care of that. :) Are you experiencing any issues at all with MC.560?
And, I'll add the separate menu item so you'll be able to continue flush Versions from cache and still keep in-memory Ancestry without being required to have a network connection.
PS - It sounds like Levente's solution avoids Proxy's, but if it changes the file format, there'd need to be a "commitment" because it'd be harder to go back. The Proxy way occurs just in memory, so we can evaluate this for a while and easily upgrade or change to something entirely different with no side effects. My app is creating hundreds of images, so I care about size right now.
squeak-dev@lists.squeakfoundation.org