[Vm-dev] the purpose of CI

Wed May 31 15:31:01 UTC 2017

Hi All,

> On May 31, 2017, at 1:54 AM, K K Subbu <kksubbu.ml at gmail.com> wrote:
> 
> On Wednesday 31 May 2017 12:35 PM, Esteban Lorenzano wrote:
>>>> On 31 May 2017, at 09:01, K K Subbu <kksubbu.ml at gmail.com> wrote:
>>>> 
>>>> On Wednesday 31 May 2017 12:18 PM, Esteban Lorenzano wrote:
>>>> 1) We need a stable branch, let’s say is Cog 2) We also need a
>>>> development branch, let’s call it CogDev
>>> IMHO, three branches are required as a minimum - stable,
>>> integration and development because there are multiple primary
>>> developers in core Pharo working on different OS platforms.
>> but nobody will do the integration step so let’s keep it simple:
>> integration is made responsibly for anyone who contributes, as it is
>> done now.
> 
> I proposed only three *branches*, not people. Splitting development into two branches and builds will help in isolating faster (separation of concerns). If all issues get cleared in dev branch itself, then integration branch will still be useful in catching regressions.

I don't believe this.  Since the chain is VMMaker.oscog => opensmalltalk/vm => CI, clumping commits together when pushing from, say, CogDev to Cog doesn't help in identifying where things broke in VMMaker.  This is why Esteban has implemented a complete autobuild path run on each VMMaker.oscog commit.

But, while this is a good thing, it isn't adequate because
a) important changes are made to opensmalltalk/vm code independent of VMMaker.oscog
b) sometimes one /has/ to break things to have them properly tested (e.g. the new compactor).  i.e. there has to be a way of getting some experimental half-baked thing through the build pipeline so brace souls can test them

> I will defer to your experience. I do understand the difference between logical and practical in these matters.

Let's take a step back and instead of discussing implementation, discuss design.

For me, a VM is good not when someone says it is, not when it builds on all platforms, but when extensive testing finds no faults in it.  For me this implies tagging versions in opensmalltalk/vm (which by design index the corresponding VMMaker.oscog because generated source is stamped with VMMaker.oscog version info) rather than using branches.

Further, novel bugs are found in VMs that are considered good, and these bugs should, if possible, be added to a test suite.  This points to a major deficiency in our ability to tests VMs.  We have no way to test the UI automatically.  We have to use humans to produce mouse clicks and keystrokes.  For me this implies tagging releases, and the ability to state that a given VM supersedes a previous known good VM.

And the previous paragraph applies equally to performance improvements, and functionality enhancements, not just bugs.

Test suites and build chains catch regressions.  Regressions in functionality and in performance are _useful information_ for developers trying to improve things, not necessarily an evil to be avoided at all costs. The system must allow pushing an experiment through the build and test pipeline to learn of a piece of development's impact.  An experiment may have to last for several months (for several reasons; the new compactor is a good example: some bugs show up in unusual circumstances; some bugs are hard to fix).

Another requirement is to provide a stable point for someone to begin new work.  They need to know that their starting point is not an experiment in progress. They need to understand that the cost of working on what is effectively a branch from the trunk is an integration step(s) into trunk layer on, and this can't be just at the opensmalltalk/vm level using fit to assist the merge, but also at the VMMaker.oscog level using Monticello to merge.  Both are good at supporting merges because both support identifying the set of changes.  Both are poor at supporting merges because they don't understand refactoring and currently only humans can massage a set of changes forwards applying refactorings to a set of changes.  This is what real merges are, and the reason why git only eases the trivial cases and why real programmers use a lot more tools to merge than just a vcs.

Can others add additional requirements, or critique the above requirements?  (Try not to mention git or ci implementations when you do).
======

With the above said what seems lacking to me is the testing framework for completed VMs.  A build not can identify commits that fail a build and also produce a VM for subsequent packaging and/or testing.  Separating the steps is very useful here.  A long pipeline with a single red or green light at the end is much less useful than a series of short pipelines, each with a separate red or green light.  Reading through a bot log to identify precisely where things broke is both tedious and, more importantly, not useful in an automated system because that identification is manual.  Separate short pipelines can be used to inform an automatic system (right Bob? Bob Westergaard built the testing system at Cadence and that is constructed from lots of small steps and it isolates faults nicely; something that an end-to-end system like Esteban's doesn't do as well).

Now, if we have a long sequence of nicely separated generate, build, package, test steps how many separate pipelines do we need to be able to collaborate?  Is it enough to be able to tag an upstream artifact as having passed some or all of its downstream tests or do we need to be able to duplicate the pipeline so people can run independent experiments?

For me, I see two modes of development; new development and maintenance.  New development is fine in a fork in some subset of the full build chain.  e.g. when working on Spur I forked within VMMaker.oscog (and, unfortunately, in part because we didn't have opensmalltalk/vm or many of the above requirements discussed, let alone met, I would break V3 for much of the time). e.g. the new compactor was forked in VMMaker.oscog without breaking Esteban's chain by my using a special generation step controlled by a switch I set in my branch.  I tested in my own sandbox until the new compactor needed testing by a wider audience.

Maintenance is some relatively quick fix one (thinks one) can safely apply to either VMMaker.oscog or opensmalltalk/vm trunk to address some issue.

Forking is fine for new development if
a) people understand and are prepared to pay the cost of merging, or, better,
b) they can use switches to include their work as optional in trunk
There are lots of switches:
A switch between versions in VMMaker.oscog, e.g. Spur memory manager vs V3, or the new Spur compactor vs the old, or the Sista JIT vs the standard, etc
A switch between a vm configuration, e.g. pharo.cog.spur vs squeak.cog.spur in a build directory, which can do any of
- select a generated source tree (e.g. spursrc vs spur64src)
- use #ifdef's to select code in the C source
- use plugins.int & plugins.ext to select a set of plugins
A switch between dialects (Pharo vs Squeak vs Newspeak)
A switch between platforms (Mac OS X vs win32, Linux x64 vs Linux ARM)

I get the above distinctions and know how to navigate amongst them upstream, but don't understand very well the downstream (how to clone the build/test CI pipeline so I can cheaply fork, work on the branch and then merge). So I'm happier using switches to try and hide new work in trunk to avoid derailing people.  And so I prefer the notion of a single pipeline that tags specific versions as good.

Is one of the requirements that people want to clearly separate maintenance from new development?

Is one of the requirements that people want to clearly identify which commit caused a specific bug? (Big discussion here about major, e.g. V3 => Spur transitions vs small grain changes; you can identify the latter, but not necessarily the former).

I suppose what I'm asking is what's the benefit of an all green build?  For me a tested, versioned and named artifact is more useful than an all green build.  An all read build is a red flag.  A mostly green build is simply a failure to segregate production from in development artifacts.

> Regards .. Subbu