[Vm-dev] the purpose of CI

Thu Jun 1 18:58:55 UTC 2017

On Thu, Jun 1, 2017 at 7:40 PM Ben Coman <btc at openinworld.com> wrote:

> On Thu, Jun 1, 2017 at 10:38 PM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> Hi Tim,
>>
>>
>> On May 31, 2017, at 11:53 PM, Tim Felgentreff <timfelgentreff at gmail.com>
>> wrote:
>>
>> Hi,
>>
>> Just re the discussion of dev and stable branch, the original idea was
>> that Cog is dev and master is stable. We never expected that people would
>> use or recommend the Cog bintray builds for anything other than development.
>>
>> But "master" ended up so far behind, recent changes would not get much
> testing leading up to Pharo release...
> $ git log master
> 17 Aug 2016
>

That should be equivalent to 201608171728 [1], which is the first stable
release we did on GitHub. We recently had to swap VMs in a Squeak
bundle, which is why there is 201701281910 [2]. But I didn't want to flag
this
release as stable.

As Tim F. already mentioned, the initial idea was to use Cog as the dev
branch, and the master for stable releases. Unfortunately, we don't have a
working procedure for releasing new OpenSmalltalkVMs. I dared to create
releases when we "had to" and after I asked Eliot for his recommendation.
But it would be much nicer to have more frequent releases which our CI
infrastructure allows us to do.

The CI already runs a selection of unit tests on some Squeak and Pharo
VMs, and bootstraps Newspeak on some Newspeak VMs. The entry point is
at [3], an example build at [4]. If this is not enough, we need to extend
testing.

Currently, I would suggest to get the Cog branch green again by allowing
all experimental VMs (e.g. sista) to fail on Travis. Then we could automate
a
merge from Cog to master when everything is green on Cog. Additionally, we
could also automate GitHub releases, for example one release per month,
considering that the more frequently updated VMs live on Bintray.

Best,
Fabio

[1]
https://github.com/OpenSmalltalk/opensmalltalk-vm/releases/tag/201608171728
[2]
https://github.com/OpenSmalltalk/opensmalltalk-vm/releases/tag/201701281910
[3]
https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/.travis_test.sh
[4]
https://travis-ci.org/OpenSmalltalk/opensmalltalk-vm/jobs/237766261#L5256

> You want to *encourage* people to use more recent VMs to get more feedback
> earlier.
>
>
>> I feel the only problem is that we need someone who merges to master when
>> it is green. I think we have already protected the master branch in the way
>> Ben suggested, i.e., you can only open a PR and merge it if the Travis
>> build is all green.
>>
>>
>> I can do this.  Ideally it would be either automatic or prompted.  What I
>> mean is that there should be a set of tests that are relevant n on images
>> using the production subset of the VMs built from the Cog branch. Whenever
>> the tests are all green then either I get sent an email prompting me to
>> push to master, or a push to master occurs.
>>
>
> I think it would be useful to differentiate between different levels of
> stability depending on application and personal perspective.
>    A. personal-stable -- stable "enough" for developer/student to use
> personally on their desktop -- might be optimistic if it passes all tests
>    B. production-stable -- "bulletproof" for operating machinery and
> business systems - maybe when A has be in wide use for a while
>
> btw, it occurs to be that "master" is a bit ambiguous.  Perhaps for (B.)
> "master" could be renamed "production"
> https://stevebennett.me/2014/02/26/git-what-they-didnt-tell-you/
>
> and for (A.) maybe introduce "stable" or alternatively "validated" to mean
> all tests passed without the strong implication it is "stable".
>
>
>>
>> Maybe I can get into the habit of checking the status of the build a few
>> hours after a commit.  But a generated email would compensate for my, um,
>> it's on the tip of my tongue, um, my, my memory!  And an automated push
>> would allow me to resume walking in front of buses.
>>
>
>> The bintray deployment should not be taken as a source of stable builds.
>> It is meant to be used by what Eliot calls brave souls who want to help to
>> test the latest and possibly unstable changes.
>>
>> Good.  This makes perfect sense to me.  Are there places in the
>> configuration to add brief overview texts explaining this to the bintray
>> download pages?  It would be great to have a short paragraph that says
>> these are development versions and directs to the master builds.
>>
>
>> P.S. for master builds Gilad has noticed that there is no .msi for the
>> newspeak builds (and I suspect there may be no .dmg).  In e.g.
>> build.win32x86/newspeak.cog.spur/installer is code to make the .msi for
>> a newspeak vm.  And the corresponding thing exists for making the Mac OS
>> .dmg.  Any brace souls feel up to trying to get them together be built?
>>
>> Just my 2c
>>
>> Tim
>>
>> On Thu, 1 Jun 2017, 06:05 Ben Coman, <btc at openinworld.com> wrote:
>>
>>> On Thu, Jun 1, 2017 at 2:27 AM, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>> 2017-05-31 17:31 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> > On May 31, 2017, at 1:54 AM, K K Subbu <kksubbu.ml at gmail.com> wrote:
>>>>> >
>>>>> > On Wednesday 31 May 2017 12:35 PM, Esteban Lorenzano wrote:
>>>>> >>>> On 31 May 2017, at 09:01, K K Subbu <kksubbu.ml at gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> On Wednesday 31 May 2017 12:18 PM, Esteban Lorenzano wrote:
>>>>> >>>> 1) We need a stable branch, let’s say is Cog 2) We also need a
>>>>> >>>> development branch, let’s call it CogDev
>>>>> >>> IMHO, three branches are required as a minimum - stable,
>>>>> >>> integration and development because there are multiple primary
>>>>> >>> developers in core Pharo working on different OS platforms.
>>>>> >> but nobody will do the integration step so let’s keep it simple:
>>>>> >> integration is made responsibly for anyone who contributes, as it is
>>>>> >> done now.
>>>>> >
>>>>> > I proposed only three *branches*, not people. Splitting development
>>>>> into two branches and builds will help in isolating faster (separation of
>>>>> concerns). If all issues get cleared in dev branch itself, then integration
>>>>> branch will still be useful in catching regressions.
>>>>>
>>>>> I don't believe this.  Since the chain is VMMaker.oscog =>
>>>>> opensmalltalk/vm => CI, clumping commits together when pushing from, say,
>>>>> CogDev to Cog doesn't help in identifying where things broke in VMMaker.
>>>>> This is why Esteban has implemented a complete autobuild path run on each
>>>>> VMMaker.oscog commit.
>>>>>
>>>>> But, while this is a good thing, it isn't adequate because
>>>>> a) important changes are made to opensmalltalk/vm code independent of
>>>>> VMMaker.oscog
>>>>> b) sometimes one /has/ to break things to have them properly tested
>>>>> (e.g. the new compactor).  i.e. there has to be a way of getting some
>>>>> experimental half-baked thing through the build pipeline so brace souls can
>>>>> test them
>>>>>
>>>>> > I will defer to your experience. I do understand the difference
>>>>> between logical and practical in these matters.
>>>>>
>>>>> Let's take a step back and instead of discussing implementation,
>>>>> discuss design.
>>>>>
>>>>> For me, a VM is good not when someone says it is, not when it builds
>>>>> on all platforms, but when extensive testing finds no faults in it.  For me
>>>>> this implies tagging versions in opensmalltalk/vm (which by design index
>>>>> the corresponding VMMaker.oscog because generated source is stamped with
>>>>> VMMaker.oscog version info) rather than using branches.
>>>>>
>>>>> Further, novel bugs are found in VMs that are considered good, and
>>>>> these bugs should, if possible, be added to a test suite.  This points to a
>>>>> major deficiency in our ability to tests VMs.  We have no way to test the
>>>>> UI automatically.  We have to use humans to produce mouse clicks and
>>>>> keystrokes.  For me this implies tagging releases, and the ability to state
>>>>> that a given VM supersedes a previous known good VM.
>>>>
>>>> I just want to interject a check everyone's understanding of git
> branches.  Although I haven't used SVN, what I've read indicates SVN
> concepts can be hard to shake and git branching is conceptually very
> different from SVN.
>
> The key thing is "a branch in Git is simply a lightweight movable pointer
> to a commit."
> The following article seems particularly useful to help frame our
> discussion...
> https://git-scm.com/book/en/v1/Git-Branching-What-a-Branch-Is
>
> A branch is much the same as a tag, the are both references to a
> particular commit, except
> * branches are mutable references
> * tags are immutable references
> http://alblue.bandlem.com/2011/04/git-tip-of-week-tags.html
>
> So if you want a moveable "good-vm" tag, maybe what you need is a branch
> reference.
>
>
>>>>> And the previous paragraph applies equally to performance
>>>>> improvements, and functionality enhancements, not just bugs.
>>>>>
>>>>> Test suites and build chains catch regressions.  Regressions in
>>>>> functionality and in performance are _useful information_ for developers
>>>>> trying to improve things, not necessarily an evil to be avoided at all
>>>>> costs.
>>>>
>>>> Agreed. But you want to deal with your own regressions not other
> peoples.  It seems harder to apply a "if you break it, you fix it"
> philosophy if its always broken.
>
>
>> The system must allow pushing an experiment through the build and test
>>>>> pipeline to learn of a piece of development's impact.
>>>>
>>>> IIUC, experimental branches (and PRs!!) can run through the CI pipeline
> identical to the Cog branch (except maybe deployment step).  There seems no
> benefit here for needing to commit directly to the Cog branch when a
> PR-commit would work the same.
>
>
>> An experiment may have to last for several months (for several reasons;
>>>>> the new compactor is a good example: some bugs show up in unusual
>>>>> circumstances; some bugs are hard to fix).
>>>>>
>>>>> Another requirement is to provide a stable point for someone to begin
>>>>> new work.  They need to know that their starting point is not an experiment
>>>>> in progress. They need to understand that the cost of working on what is
>>>>> effectively a branch from the trunk is an integration step(s) into trunk
>>>>> layer on, and this can't be just at the opensmalltalk/vm level using fit to
>>>>> assist the merge, but also at the VMMaker.oscog level using Monticello to
>>>>> merge.  Both are good at supporting merges because both support identifying
>>>>> the set of changes.  Both are poor at supporting merges because they don't
>>>>> understand refactoring and currently only humans can massage a set of
>>>>> changes forwards applying refactorings to a set of changes.  This is what
>>>>> real merges are, and the reason why git only eases the trivial cases and
>>>>> why real programmers use a lot more tools to merge than just a vcs.
>>>>>
>>>>> Can others add additional requirements, or critique the above
>>>>> requirements?  (Try not to mention git or ci implementations when you do).
>>>>> ======
>>>>>
>>>>> With the above said what seems lacking to me is the testing framework
>>>>> for completed VMs.  A build not can identify commits that fail a build and
>>>>> also produce a VM for subsequent packaging and/or testing.  Separating the
>>>>> steps is very useful here.  A long pipeline with a single red or green
>>>>> light at the end is much less useful than a series of short pipelines, each
>>>>> with a separate red or green light.  Reading through a bot log to identify
>>>>> precisely where things broke is both tedious and, more importantly, not
>>>>> useful in an automated system because that identification is manual.
>>>>> Separate short pipelines can be used to inform an automatic system (right
>>>>> Bob? Bob Westergaard built the testing system at Cadence and that is
>>>>> constructed from lots of small steps and it isolates faults nicely;
>>>>> something that an end-to-end system like Esteban's doesn't do as well).
>>>>>
>>>>> Now, if we have a long sequence of nicely separated generate, build,
>>>>> package, test steps how many separate pipelines do we need to be able to
>>>>> collaborate?  Is it enough to be able to tag an upstream artifact as having
>>>>> passed some or all of its downstream tests or do we need to be able to
>>>>> duplicate the pipeline so people can run independent experiments?
>>>>>
>>>>> For me, I see two modes of development; new development and
>>>>> maintenance.  New development is fine in a fork in some subset of the full
>>>>> build chain.  e.g. when working on Spur I forked within VMMaker.oscog (and,
>>>>> unfortunately, in part because we didn't have opensmalltalk/vm or many of
>>>>> the above requirements discussed, let alone met, I would break V3 for much
>>>>> of the time). e.g. the new compactor was forked in VMMaker.oscog without
>>>>> breaking Esteban's chain by my using a special generation step controlled
>>>>> by a switch I set in my branch.  I tested in my own sandbox until the new
>>>>> compactor needed testing by a wider audience.
>>>>>
>>>>> Maintenance is some relatively quick fix one (thinks one) can safely
>>>>> apply to either VMMaker.oscog or opensmalltalk/vm trunk to address some
>>>>> issue.
>>>>>
>>>>> Forking is fine for new development if
>>>>> a) people understand and are prepared to pay the cost of merging, or,
>>>>> better,
>>>>> b) they can use switches to include their work as optional in trunk
>>>>> There are lots of switches:
>>>>> A switch between versions in VMMaker.oscog, e.g. Spur memory manager
>>>>> vs V3, or the new Spur compactor vs the old, or the Sista JIT vs the
>>>>> standard, etc
>>>>> A switch between a vm configuration, e.g. pharo.cog.spur vs
>>>>> squeak.cog.spur in a build directory, which can do any of
>>>>> - select a generated source tree (e.g. spursrc vs spur64src)
>>>>> - use #ifdef's to select code in the C source
>>>>> - use plugins.int & plugins.ext to select a set of plugins
>>>>> A switch between dialects (Pharo vs Squeak vs Newspeak)
>>>>> A switch between platforms (Mac OS X vs win32, Linux x64 vs Linux ARM)
>>>>>
>>>>>
>>>>> I get the above distinctions and know how to navigate amongst them
>>>>> upstream, but don't understand very well the downstream (how to clone the
>>>>> build/test CI pipeline so I can cheaply fork, work on the branch and then
>>>>> merge). So I'm happier using switches to try and hide new work in trunk to
>>>>> avoid derailing people.  And so I prefer the notion of a single pipeline
>>>>> that tags specific versions as good.
>>>>>
>>>>> Is one of the requirements that people want to clearly separate
>>>>> maintenance from new development?
>>>>>
>>>>
> This diagram may be a good reference for discussion, of how maintenance
> hotfixes can relate to development branches.
>
> http://1.bp.blogspot.com/-ct9MmWf5gJk/U2Pe9V8A5GI/AAAAAAAAAT0/0Y-XvAb9RB8/s1600/gitflow-orig-diagram.png
>
> cheers -ben
>
>>
>>>>> Is one of the requirements that people want to clearly identify which
>>>>> commit caused a specific bug? (Big discussion here about major, e.g. V3 =>
>>>>> Spur transitions vs small grain changes; you can identify the latter, but
>>>>> not necessarily the former).
>>>>
>>>> I suppose what I'm asking is what's the benefit of an all green build?
>>>>> For me a tested, version and named artefact is more useful than an all
>>>>> green build.  An all red build is a red flag.  A mostly green build is
>>>>> simply a failure to segregate production from in development artefacts.
>>>>>
>>>>>
>>>>>
>>>> Hi Eliot,
>>>> the main advantage of github is the social thing:
>>>> - lower barrier of contributing via a better integration of tools
>>>>  (not only vcs, but issue tracker, wiki, continuous integration, code
>>>> review/comments and pull request - even if we under use most of these
>>>> tools),
>>>> - and ease integration of many small contributions back.
>>>> For this to work well, such work MUST happen in separate branches.
>>>> in this context, there is an obvious benefit of green build: quickly
>>>> estimate if we can merge a pull request or not.
>>>> when red, we have no information about possible regressions, and have
>>>> to go through the tedious part: go down into the console log of both
>>>> builds, try to understand and compare... There is already enough work
>>>> involved in reviewing source code.
>>>>
>>> I see that my opening argument was simplistic.  However Nicolas' point
>>> above is probably more significant.
>>> If we want to encourage new contributors, we need:
>>>
>>> * to show that the CI builds are cared for
>>>
>>> * allow newcomers to be confident that the tip they are working from is
>>> green before they start.  When they submit their PR and the CI tests fail,
>>> they should be able to zero in the failures *they* caused and *as*a*newbie*
>>> not have to sort through the confounding factors from other's failures.
>>>
>>> * act timely to integrate, to encourage further contributions.  If
>>> someone contributes a good fix, a green CI test may make you inclined to
>>> quickly review and integrate. But when the CI shows failure, how will you
>>> feel about looking into it? Further, when the mainline returns to green,
>>> the existing PRs don't automatically retest, and no-one seems to be
>>> manually managing them, so such PRs seem to end up in limbo which is
>>> *really* discouraging for potential contributors.
>>>
>>> cheers -ben
>>>
>>>
>>>> I tend to agree on your view for mid/long term changes:
>>>> Say a developper A works on new garbage collector, developper B on
>>>> 64bits compatibility, developer C on lowcode extension and developer D on
>>>> sista (though maybe there is a single developper touching 3 of these)
>>>> Since each of these devs are going to take months, and touch many core
>>>> methods scattered in interpreter/jit/object memory or CCodeGenerator, then
>>>> it's going to be very difficult to merge (way too many conflicts).
>>>>
>>>> If on different branches, there is the option to rebase or merge with
>>>> other branches. But it doesn't scale with N branches touching same core
>>>> methods: N developpers would have to rebase on N-1 concurrent branches,
>>>> resolve the exact same conflicts etc... Obviously, concurrent work would
>>>> have to be integrated back ASAP in a master branch.
>>>>
>>>> So, a good branch is a short branch, if possible covering a minimal
>>>> feature set.
>>>> And long devs you describe must not be handled by branches, but by
>>>> switches.
>>>> This gives you a chance to inspect the impact of your own refactoring
>>>> on your coworkers.
>>>>
>>>> In this model, yes, you have a license to break your own artifact (say
>>>> generationalScavenger, win64, lowcode, sista).
>>>> But you must be informed if ever you broke the production VM, and/or
>>>> concurrent artifacts. You have to maintain a minimal set of features
>>>> working, otherwise you prevent others to work. In the scavenger case, you
>>>> used a branch for a short period, and that worked quite well.
>>>>
>>>> In this context, I agree, a single green light is not enough.
>>>> We need a sort of status board tracing the regressions individually.
>>>>
>>>>
>>>>
>>>>
>>>>> > Regards .. Subbu
>>>>>
>>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170601/6bca1d63/attachment-0001.html>