[squeak-dev] Package commit granularity vs. Project commit granularity (was: FileStreams Limit)

Jakob Reschke jakres+squeak at gmail.com
Sun Feb 20 19:05:44 UTC 2022


Hi Jörg,

Bernhard was faster than me in compiling an answer and, as usual for
me, his is much more concise. ;-) I have worked with ENVY in VAST in
the past and from what you described I will make the assumption that
the developer interactions with Store are similar to the ones with
ENVY at the level of package versions and configuration map/bundle
versions.

I will try to keep it shorter, too:
- Bernhard is right that Git does not do fully-fledged configuration
management. (And git submodules do not seem to do it well.)
- Since each commit is like a snapshot of the whole project, each
single commit is implicitly a configuration of the packages that exist
in this commit.
- To get bundle or configuration map functionality, we combine Git
with Metacello. In a Metacello baseline you declare which packages
belong together, what the dependencies between the packages are, and
which external dependencies there are and from which repositories to
get them. The baseline can also take care of conditional loading (e.
g. depending on the Squeak version or whether this is in fact a Pharo
or Gemstone image). The Metacello baseline is usually versioned along
with the packages, as a separate artifact next to them but with a
shared Git history.
- To get a particular version of the project, or a package of it, you
identify a particular commit and check out what you need from there.
Use Metacello and the baseline to get dependencies resolved and things
loaded in the correct order. Example below this list.
- The state of a single package may be the same in numerous Git
commits if only other parts of the project are changed. It is no issue
though, you can load this package from any of these commits and get
the same effect.
- To identify some commits as released versions of the baseline and
the packages along with it, you can tag commits. You could use this to
reintroduce the concept of bundle version names.
- If you want to create new commits of a package without staying up to
date with new commits that address another package, you must use
branches in Git. In effect the same would happen in a Store
repository, you get parallel versions of the packages which are not
published in a bundle together. You just do not have to tell Store
that you are diverging from your coworker's work, it just happens. I
think it is good to make that diverging explicit, therefore I believe
the use of branches is not worse but just different, and that it is an
acceptable extra step to take.
- If you want to release a new configuration of packages that have
previously diverged, you must merge branches. Just like you may end up
integrating newer package versions, or even merge parallel versions of
the same package in Store.
- Git only stores the history of commits, it does not maintain an
extra history per file or directory. Therefore, to get back at the
history of a single package, use git log <directory of the package>,
which will filter the commit history and show only commits that
changed this part of the directory tree.
- The difference in history writing also becomes apparent when you
need to revert a package to an earlier version (e. g. set back P2 from
V2.8 to V2.7 due to a regression and publish a new bundle version with
P2 V2.7): since the history just goes forward and in Git the history
of the baseline does not progress separately from the history of the
packages, you must make a commit that undoes some earlier changes to
P2. This revert will appear in the history filtered to P2, of course,
whereas you probably would not create a P2 V2.9 that equals V2.7 in
Store. To fix the problem in P2, you would in both systems create a
fixed version of P2 that descends from V2.7 (/from a commit with that
state), then update the bundle with the fix version (/merge that fix).
In Git one extra step is needed that is not required with proper
configurations: you would have to undo the revert in P2 on the main
branch at one point once you want to publish the fix.

To load your example project from a Git repository, which has a tag
called "B_V1.0" as an equivalent to V1.0 of that bundle B, you would
use this:

    Metacello new
        repository: 'github://username/repository:B_V1.0/src';
        baseline: 'B';
        load.

Where it says B_V1.0 in the repository string, there goes the commit
id, which can be given as a commit sha1 hash, tag name, or a branch
name (but then it is not a stable reference).

To load only "Tools-Package" from a repository with "App-Bundle" (now
I verified that this works):

    Metacello new
        repository: 'github://username/repository:App_V4.2/src';
        baseline: 'App-Bundle';
        load: 'Tools-Package'.

(If Tools-Package would depend on another package according to the
baseline, that other package would be loaded as well from the same
commit.)


You are right that the tool integration is lacking in Squeak, when
compared to Store/ENVY in VW/VAST. Neither Metacello nor the Git tools
are part of the Squeak Trunk, only Monticello is. And not even
Monticello is integrated into the class browser. Every version control
system is like an add-on that exists alongside but not inside the core
tools. Metacello does not even have tools (GUI). People seem to have
been doing well enough with writing the Smalltalk code that makes up a
baseline by hand.

Kind regards,
Jakob

Am So., 20. Feb. 2022 um 16:33 Uhr schrieb Bernhard Pieber
<bernhard at pieber.com>:
>
> Hi Jörg,
>
> Thanks for the detailed description about how Store works. In fact, it reminds me very much of ENVY. What you describe as a bundle is called ConfigurationMap in ENVY, which I think is quite a fitting name. Packages are called Applications and SubApplications, which are very bad names. In addition there are ConfigurationExpressions which are useful for conditional loading on different platforms and dialects.
>
> I agree that this system works very well. Bundles/ConfigurationMaps are essential in addition to packages IMO. As far as I understand it, Git completely misses this functionality. This is intentional, as Git is just a versioning system not a configuration management system. That means you need additional tools to solve that problem.
>
> For dependency management there are two possible solutions:
> a) Each package specifies (stores) the dependencies it needs, either a specific version or a range of versions or just the name of packages.
> b) The dependencies between the packages are specified (stored) outside the package itself in separate artefacts, which itself need to be versioned, i.e. bundles/ConfigurationMaps/configurations.
>
> I think it is confusing to mix both approaches and according to my experience the second approach is much better. The first approach is used far more often, though. Monticello dependencies and Metacello are examples.
>
> Monticello configurations on the other hand are exactly the same thing as bundles/ConfigurationMaps IMO. The tooling is not good enough, which I think is one the reasons why it is not used more often.
>
> But I think you should be able to achieve the working style you described with Monticello and Monticello configurations.
>
> Cheers,
> Bernhard
>
>
> > Am 19.02.2022 um 19:49 schrieb Jörg Belger <unique75 at web.de>:
> >
> > Ohh yes, you are right Jacob, I sent it only to you, it was a mistake. So I will send it now to the mailing list, maybe others find it also interesting. I hope it, I don’t want to be punished for my clear announcements :-)
> >
> > I can imagine that we could also create issues on Github for ideas/proposals/wishes what the most important things are to do. The people can then discuss and prioritize them there and the discussion is then preserved and well documented for future. Everyone who feels called can then pick up such „proposal issues“ and just try to implement it. But discussion these things via email is bad, because all the discussion is forgotten in 1 month and nobody searches in its emails for such things. What we need is a notification system in Github, everybody who wants to be informed by email can do it. I guess Github have something like this, but I am not sure. It would be funny if the master of versioning systems have not such a feature, where people can add itself for notifications. Maybe this here is read by a Github expert, I am just a newbie in Github.
> >
> > Jörg
> >
> >> Am 19.02.2022 um 17:26 schrieb Jakob Reschke <jakres+squeak at gmail.com>:
> >>
> >> Hi Jörg,
> >>
> >> Thank you for this practical explanation! Did you intentionally send
> >> it only to me and not to the list?
> >>
> >> Kind regards,
> >> Jakob
> >>
> >> Am Sa., 19. Feb. 2022 um 14:05 Uhr schrieb Jörg Belger <unique75 at web.de>:
> >>>
> >>> Hi Jacob,
> >>>
> >>> Please let me give an insight how my workflow is in business with Cincom VisualWorks + Store and why I have the feeling that the Squeak versioning with Monticello and Github is over-complicated and user-unfriendly.
> >>>
> >>> Store breaks the world into bundles and packages. A bundle can contain other bundles and packages, whereas a package contains code. You can have of course multiple repositories, but only one active repository, where all your publishes/commits goes into. Store is fully integrated into class browser and you see there the bundles/packages as first.class-objects and can right click it and publish it to the active connected repository. You can also add class extension methods into a package, so that you have a view on all your methods that you added to system classes.
> >>>
> >>> Let´s assume I create now a new package P1, put some code inside and publish it as V1.0. I create now a second package P2 and publish it as V1.0. Let’s assume I did more changes and currently I have already „package P1 V1.2“ and „package P2 V2.7“ in image and repository. I can now create a bundle B, add the two packages P1 and P2 to it and publish the bundle as V1.0. Meanwhile another person works on „package P1“ und publishes V1.3. Every time you load now „bundle B V1.0“ you will get the „package P1 V1.2“ and „package P2 V2.7“ in image. The changes of „package P1 V1.3“ does not matter, because you have fixed/linked some specific package versions into bundle version V1.0. We do that often to combine some tested versions of different subprojects together. From time to time somebody goes through the bundle and look for newer package versions and create a new tested release, where all the new package versions are fixed/linked together again and which is tested. The bundle B is then published as V1.1.
> >>>
> >>> You can also set „package P1“ as prerequisite for „package P2“. Every time you load P2 you will get a dialog to select the right version of P1. Prerequisite loading is only done within the same active repository. Depending on settings a prerequisite can also be loaded from the installation path as a parcel. But there is no way to load prerequisites from other repositories than the active one.
> >>>
> >>> Let´s assume we allow multi-repository prerequisites. The Core team adds „package P2“ to its supported things, because it is a wonderful tool and put it into the Core repository R2. But this package P2 depends on package P1 which is on repository R1. The repository R1 is maintained by one private person and 1 year later the repository is deleted or moved to another location or the package P1 there is broken and unloadable etc. etc. etc. There are many reason for that, e.g. the one person has another working style, another quality assumptions etc.
> >>>
> >>> What is the end of this story? You have a package P2 on the Core repository R2, which is supported… but it does not work, coz side dependencies are out of control and does not work anymore. This is exactly the situation where I run often into when I tried to load things. The URLs in the configuration of supported things does not work anymore or the version did not match anymore, I got always loading hell problems and that is just frustrating. You want to load something and everything you try is not working.
> >>>
> >>> I do not a see a good way to work with Github like I do it with Store, where I can independently publish different packages with different versions, have prerequisites or fix/link bundle versions. And I think it is not the right way to create a Github repository for every single package. A package is only a logic unit to divide something in logical pieces, it is not a full project or a full eco system like Squeak. I think Monticello is the one that looks to me similar to Store. But it needs a better prerequisite system and not the messy configurations. When I browse a repository version that browser looks to me that this should be the class browser, because I see there my package structure and all the extensions methods.
> >>>
> >>> Surely I am a newbie to Monticello and all I learned so far is that I load a „ConfigurationOfXXX“ from repository, but then all I know is to execute some workspace code like „ConfigurationOfXXX project latestVersion load“. Maybe there is a better way to not enter always the workspace code. I want simply to load a package with a one-click action in a repository browser and I do not want to think about not-working side-reference URLs and other things. Often I searched around the internet to find a working version of some specific prerequisite and it is always a mess. There happened so crazy things, that I wanted to load something that depends on FFI, but the FFI that is loaded through the Configuration is outdated. I find then a newer FFI in the internet, load that before and the other package was working better. This is nothing what a newbie wants to have when you just evaluate the new Squeak world.
> >>>
> >>> One other topic I want to mention about versioning is… if we have an issue tracking system… we normally publish a bundle/package with that issue number inside the version string. That means somebody finds a bug and creates an issue 4711 and put it into the public pool. Another person can pickup this issue 4711, solve the problem and publish the changes as „package P1 V1.1 + issue 4711 1“. The last number „1“ is incremented with every publish. This is very helpful, because others see immediately that this version branch is forbidden and „work in progress“. When the issue is finished it is delegated to a reviewer or a pool, where others can pick it up and make the review. The reviewer loads the „issue 4711“ branch of the package, integrate it and publish a new trunk version of the package, in that case it would be „package P1 V1.2“. The advantage is that every time you asking yourself why a specific change was coming into a method, you can see that „issue 4711“ branch in the version history and then you can look into the issue tracker at issue 4711 what the reasons were.
> >>>
> >>> I do not see how we can work in such a professional way together when we are using emails. I am seeing often emails flying around with Review requests, but emails can be forgotten. The packages contain only some magic name, but no issue number, so the changes are not traceable from what issue they came and why and when. And please guess me, it happens often that you ask 2 years later why a specific change is inside. It is very helpful then to have the documentation history in the issue what the person thought 2 years before.
> >>>
> >>> I think the optimal solution would be if everybody can subscribe in Github for specific „labels“, „projects“ or whatever. Then everybody can decide by yourself if he/she wants email notification or not. But I do not know if Github can do this.
> >>>
> >>> Jörg
> >>>
> >>>
> >>>> Am 19.02.2022 um 12:07 schrieb Jakob Reschke <jakres+squeak at gmail.com>:
> >>>>
> >>>> Dear Jörg, dear all,
> >>>>
> >>>> Am Sa., 19. Feb. 2022 um 10:54 Uhr schrieb Jörg Belger <unique75 at web.de>:
> >>>>>
> >>>>> If somebody thinks know to use Github as source code repository, I like the idea only a little bit, Pharo uses it. It has the advantage that nobody needs to run an own server at home :-)
> >>>>> But the big disadvantage is, that you cannot think anymore in packages with its own version history like in Monticello. You need to think in repositories and branches and if you change two packages that are linked to one Github repository and you commit the first package, then I think you commit the second package too under the same version, because in reality you do not commit the package, you commit your current repository from disk to server. You lost the possibility to version each package differently.  That makes it impossible to load later packageA with version 1.5, but loading packageB with version 2.7.
> >>>>>
> >>>>
> >>>> To be honest, I do not see the problem here. Smalltalk version control
> >>>> systems almost all offer the feature of configuration maps because it
> >>>> is often required to specify which package versions work together
> >>>> because the packages in fact belong together. In Git and similarly
> >>>> organized version control systems, simply every commit is a
> >>>> configuration of the packages contained therein. If you do not change
> >>>> packageB on your new commit in which you want to change packageA,
> >>>> packageB in that new commit will be equal to packageB in the parent
> >>>> commit. It will not make a difference for packageB from which of the
> >>>> two commits you load it, so why care?
> >>>>
> >>>> If you want to grab a change to one package from a commit but not a
> >>>> change to another package in the same commit -- for whatever reason
> >>>> that the maintainer apparently did not have in mind -- simply make a
> >>>> selective load/checkout: deselect the changes to the other package
> >>>> that you do not want to apply to the image, just as you would deselect
> >>>> changes that you do not want to commit when you create a new commit.
> >>>> Or the other way around: browse the commit from which you need some
> >>>> changes, select the package/category/class which you want, and load
> >>>> just that.
> >>>>
> >>>> If the two packages are really not connected to each other and it is
> >>>> supported to freely combine different versions of them, then there is
> >>>> no point to develop them on the same branch or even in the same
> >>>> repository.
> >>>>
> >>>>> Does anybody knows if we can have a Github repository where we can think like a Smalltalker and commit single packages ?
> >>>>
> >>>> It is just a matter of organizing it as such. Create a new GitHub
> >>>> repository for each package and commit only one package to each
> >>>> repository. The repositories are already organized under your
> >>>> username. For everything beyond a single person, there are GitHub
> >>>> organizations like https://github.com/squeak-smalltalk/. If you want
> >>>> to stay within a single Git repository for some reason, create a
> >>>> separate branch for each package and only commit a single package on
> >>>> each branch.
> >>>>
> >>>> Kind regards,
> >>>> Jakob
> >>>
> >
> >
>
>


More information about the Squeak-dev mailing list