[Vm-dev] the purpose of CI
btc at openinworld.com
Thu Jun 1 04:05:06 UTC 2017
On Thu, Jun 1, 2017 at 2:27 AM, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:
> 2017-05-31 17:31 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>> Hi All,
>> > On May 31, 2017, at 1:54 AM, K K Subbu <kksubbu.ml at gmail.com> wrote:
>> > On Wednesday 31 May 2017 12:35 PM, Esteban Lorenzano wrote:
>> >>>> On 31 May 2017, at 09:01, K K Subbu <kksubbu.ml at gmail.com> wrote:
>> >>>> On Wednesday 31 May 2017 12:18 PM, Esteban Lorenzano wrote:
>> >>>> 1) We need a stable branch, let’s say is Cog 2) We also need a
>> >>>> development branch, let’s call it CogDev
>> >>> IMHO, three branches are required as a minimum - stable,
>> >>> integration and development because there are multiple primary
>> >>> developers in core Pharo working on different OS platforms.
>> >> but nobody will do the integration step so let’s keep it simple:
>> >> integration is made responsibly for anyone who contributes, as it is
>> >> done now.
>> > I proposed only three *branches*, not people. Splitting development
>> into two branches and builds will help in isolating faster (separation of
>> concerns). If all issues get cleared in dev branch itself, then integration
>> branch will still be useful in catching regressions.
>> I don't believe this. Since the chain is VMMaker.oscog =>
>> opensmalltalk/vm => CI, clumping commits together when pushing from, say,
>> CogDev to Cog doesn't help in identifying where things broke in VMMaker.
>> This is why Esteban has implemented a complete autobuild path run on each
>> VMMaker.oscog commit.
>> But, while this is a good thing, it isn't adequate because
>> a) important changes are made to opensmalltalk/vm code independent of
>> b) sometimes one /has/ to break things to have them properly tested (e.g.
>> the new compactor). i.e. there has to be a way of getting some
>> experimental half-baked thing through the build pipeline so brace souls can
>> test them
>> > I will defer to your experience. I do understand the difference between
>> logical and practical in these matters.
>> Let's take a step back and instead of discussing implementation, discuss
>> For me, a VM is good not when someone says it is, not when it builds on
>> all platforms, but when extensive testing finds no faults in it. For me
>> this implies tagging versions in opensmalltalk/vm (which by design index
>> the corresponding VMMaker.oscog because generated source is stamped with
>> VMMaker.oscog version info) rather than using branches.
>> Further, novel bugs are found in VMs that are considered good, and these
>> bugs should, if possible, be added to a test suite. This points to a major
>> deficiency in our ability to tests VMs. We have no way to test the UI
>> automatically. We have to use humans to produce mouse clicks and
>> keystrokes. For me this implies tagging releases, and the ability to state
>> that a given VM supersedes a previous known good VM.
>> And the previous paragraph applies equally to performance improvements,
>> and functionality enhancements, not just bugs.
>> Test suites and build chains catch regressions. Regressions in
>> functionality and in performance are _useful information_ for developers
>> trying to improve things, not necessarily an evil to be avoided at all
>> costs. The system must allow pushing an experiment through the build and
>> test pipeline to learn of a piece of development's impact. An experiment
>> may have to last for several months (for several reasons; the new compactor
>> is a good example: some bugs show up in unusual circumstances; some bugs
>> are hard to fix).
>> Another requirement is to provide a stable point for someone to begin new
>> work. They need to know that their starting point is not an experiment in
>> progress. They need to understand that the cost of working on what is
>> effectively a branch from the trunk is an integration step(s) into trunk
>> layer on, and this can't be just at the opensmalltalk/vm level using fit to
>> assist the merge, but also at the VMMaker.oscog level using Monticello to
>> merge. Both are good at supporting merges because both support identifying
>> the set of changes. Both are poor at supporting merges because they don't
>> understand refactoring and currently only humans can massage a set of
>> changes forwards applying refactorings to a set of changes. This is what
>> real merges are, and the reason why git only eases the trivial cases and
>> why real programmers use a lot more tools to merge than just a vcs.
>> Can others add additional requirements, or critique the above
>> requirements? (Try not to mention git or ci implementations when you do).
>> With the above said what seems lacking to me is the testing framework for
>> completed VMs. A build not can identify commits that fail a build and also
>> produce a VM for subsequent packaging and/or testing. Separating the steps
>> is very useful here. A long pipeline with a single red or green light at
>> the end is much less useful than a series of short pipelines, each with a
>> separate red or green light. Reading through a bot log to identify
>> precisely where things broke is both tedious and, more importantly, not
>> useful in an automated system because that identification is manual.
>> Separate short pipelines can be used to inform an automatic system (right
>> Bob? Bob Westergaard built the testing system at Cadence and that is
>> constructed from lots of small steps and it isolates faults nicely;
>> something that an end-to-end system like Esteban's doesn't do as well).
>> Now, if we have a long sequence of nicely separated generate, build,
>> package, test steps how many separate pipelines do we need to be able to
>> collaborate? Is it enough to be able to tag an upstream artifact as having
>> passed some or all of its downstream tests or do we need to be able to
>> duplicate the pipeline so people can run independent experiments?
>> For me, I see two modes of development; new development and maintenance.
>> New development is fine in a fork in some subset of the full build chain.
>> e.g. when working on Spur I forked within VMMaker.oscog (and,
>> unfortunately, in part because we didn't have opensmalltalk/vm or many of
>> the above requirements discussed, let alone met, I would break V3 for much
>> of the time). e.g. the new compactor was forked in VMMaker.oscog without
>> breaking Esteban's chain by my using a special generation step controlled
>> by a switch I set in my branch. I tested in my own sandbox until the new
>> compactor needed testing by a wider audience.
>> Maintenance is some relatively quick fix one (thinks one) can safely
>> apply to either VMMaker.oscog or opensmalltalk/vm trunk to address some
>> Forking is fine for new development if
>> a) people understand and are prepared to pay the cost of merging, or,
>> b) they can use switches to include their work as optional in trunk
>> There are lots of switches:
>> A switch between versions in VMMaker.oscog, e.g. Spur memory manager vs
>> V3, or the new Spur compactor vs the old, or the Sista JIT vs the standard,
>> A switch between a vm configuration, e.g. pharo.cog.spur vs
>> squeak.cog.spur in a build directory, which can do any of
>> - select a generated source tree (e.g. spursrc vs spur64src)
>> - use #ifdef's to select code in the C source
>> - use plugins.int & plugins.ext to select a set of plugins
>> A switch between dialects (Pharo vs Squeak vs Newspeak)
>> A switch between platforms (Mac OS X vs win32, Linux x64 vs Linux ARM)
>> I get the above distinctions and know how to navigate amongst them
>> upstream, but don't understand very well the downstream (how to clone the
>> build/test CI pipeline so I can cheaply fork, work on the branch and then
>> merge). So I'm happier using switches to try and hide new work in trunk to
>> avoid derailing people. And so I prefer the notion of a single pipeline
>> that tags specific versions as good.
>> Is one of the requirements that people want to clearly separate
>> maintenance from new development?
>> Is one of the requirements that people want to clearly identify which
>> commit caused a specific bug? (Big discussion here about major, e.g. V3 =>
>> Spur transitions vs small grain changes; you can identify the latter, but
>> not necessarily the former).
>> I suppose what I'm asking is what's the benefit of an all green build?
>> For me a tested, versioned and named artifact is more useful than an all
>> green build. An all read build is a red flag. A mostly green build is
>> simply a failure to segregate production from in development artifacts.
> Hi Eliot,
> the main advantage of github is the social thing:
> - lower barrier of contributing via a better integration of tools
> (not only vcs, but issue tracker, wiki, continuous integration, code
> review/comments and pull request - even if we under use most of these
> - and ease integration of many small contributions back.
> For this to work well, such work MUST happen in separate branches.
> in this context, there is an obvious benefit of green build: quickly
> estimate if we can merge a pull request or not.
> when red, we have no information about possible regressions, and have to
> go through the tedious part: go down into the console log of both builds,
> try to understand and compare... There is already enough work involved in
> reviewing source code.
I see that my opening argument was simplistic. However Nicolas' point
above is probably more significant.
If we want to encourage new contributors, we need:
* to show that the CI builds are cared for
* allow newcomers to be confident that the tip they are working from is
green before they start. When they submit their PR and the CI tests fail,
they should be able to zero in the failures *they* caused and *as*a*newbie*
not have to sort through the confounding factors from other's failures.
* act timely to integrate, to encourage further contributions. If someone
contributes a good fix, a green CI test may make you inclined to quickly
review and integrate. But when the CI shows failure, how will you feel
about looking into it? Further, when the mainline returns to green, the
existing PRs don't automatically retest, and no-one seems to be manually
managing them, so such PRs seem to end up in limbo which is *really*
discouraging for potential contributors.
> I tend to agree on your view for mid/long term changes:
> Say a developper A works on new garbage collector, developper B on 64bits
> compatibility, developer C on lowcode extension and developer D on sista
> (though maybe there is a single developper touching 3 of these)
> Since each of these devs are going to take months, and touch many core
> methods scattered in interpreter/jit/object memory or CCodeGenerator, then
> it's going to be very difficult to merge (way too many conflicts).
> If on different branches, there is the option to rebase or merge with
> other branches. But it doesn't scale with N branches touching same core
> methods: N developpers would have to rebase on N-1 concurrent branches,
> resolve the exact same conflicts etc... Obviously, concurrent work would
> have to be integrated back ASAP in a master branch.
> So, a good branch is a short branch, if possible covering a minimal
> feature set.
> And long devs you describe must not be handled by branches, but by
> This gives you a chance to inspect the impact of your own refactoring on
> your coworkers.
> In this model, yes, you have a license to break your own artifact (say
> generationalScavenger, win64, lowcode, sista).
> But you must be informed if ever you broke the production VM, and/or
> concurrent artifacts. You have to maintain a minimal set of features
> working, otherwise you prevent others to work. In the scavenger case, you
> used a branch for a short period, and that worked quite well.
> In this context, I agree, a single green light is not enough.
> We need a sort of status board tracing the regressions individually.
>> > Regards .. Subbu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Vm-dev