[Vm-dev] the purpose of CI

Thu Jun 1 23:41:40 UTC 2017

Hi Fabio,

On Thu, Jun 1, 2017 at 11:58 AM, Fabio Niephaus <lists at fniephaus.com> wrote:

>
> On Thu, Jun 1, 2017 at 7:40 PM Ben Coman <btc at openinworld.com> wrote:
>
>> On Thu, Jun 1, 2017 at 10:38 PM, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>
>>>
>>> Hi Tim,
>>>
>>>
>>> On May 31, 2017, at 11:53 PM, Tim Felgentreff <timfelgentreff at gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Just re the discussion of dev and stable branch, the original idea was
>>> that Cog is dev and master is stable. We never expected that people would
>>> use or recommend the Cog bintray builds for anything other than development.
>>>
>>> But "master" ended up so far behind, recent changes would not get much
>> testing leading up to Pharo release...
>> $ git log master
>> 17 Aug 2016
>>
>
> That should be equivalent to 201608171728 [1], which is the first stable
> release we did on GitHub. We recently had to swap VMs in a Squeak
> bundle, which is why there is 201701281910 [2]. But I didn't want to flag
> this
> release as stable.
>
> As Tim F. already mentioned, the initial idea was to use Cog as the dev
> branch, and the master for stable releases. Unfortunately, we don't have a
> working procedure for releasing new OpenSmalltalkVMs. I dared to create
> releases when we "had to" and after I asked Eliot for his recommendation.
> But it would be much nicer to have more frequent releases which our CI
> infrastructure allows us to do.
>
> The CI already runs a selection of unit tests on some Squeak and Pharo
> VMs, and bootstraps Newspeak on some Newspeak VMs. The entry point is
> at [3], an example build at [4]. If this is not enough, we need to extend
> testing.
>
> Currently, I would suggest to get the Cog branch green again by allowing
> all experimental VMs (e.g. sista) to fail on Travis. Then we could
> automate a
> merge from Cog to master when everything is green on Cog.
>

+1.  This should happen asap.

> Additionally, we
> could also automate GitHub releases, for example one release per month,
> considering that the more frequently updated VMs live on Bintray.
>

+1

>
> Best,
> Fabio
>
> [1] https://github.com/OpenSmalltalk/opensmalltalk-
> vm/releases/tag/201608171728
> [2] https://github.com/OpenSmalltalk/opensmalltalk-
> vm/releases/tag/201701281910
> [3] https://github.com/OpenSmalltalk/opensmalltalk-
> vm/blob/Cog/.travis_test.sh
> [4] https://travis-ci.org/OpenSmalltalk/opensmalltalk-
> vm/jobs/237766261#L5256
>
>
>
>> You want to *encourage* people to use more recent VMs to get more
>> feedback earlier.
>>
>>
>>> I feel the only problem is that we need someone who merges to master
>>> when it is green. I think we have already protected the master branch in
>>> the way Ben suggested, i.e., you can only open a PR and merge it if the
>>> Travis build is all green.
>>>
>>>
>>> I can do this.  Ideally it would be either automatic or prompted.  What
>>> I mean is that there should be a set of tests that are relevant n on images
>>> using the production subset of the VMs built from the Cog branch. Whenever
>>> the tests are all green then either I get sent an email prompting me to
>>> push to master, or a push to master occurs.
>>>
>>
>> I think it would be useful to differentiate between different levels of
>> stability depending on application and personal perspective.
>>    A. personal-stable -- stable "enough" for developer/student to use
>> personally on their desktop -- might be optimistic if it passes all tests
>>    B. production-stable -- "bulletproof" for operating machinery and
>> business systems - maybe when A has be in wide use for a while
>>
>> btw, it occurs to be that "master" is a bit ambiguous.  Perhaps for (B.)
>> "master" could be renamed "production"
>> https://stevebennett.me/2014/02/26/git-what-they-didnt-tell-you/
>>
>> and for (A.) maybe introduce "stable" or alternatively "validated" to
>> mean all tests passed without the strong implication it is "stable".
>>
>>
>>>
>>> Maybe I can get into the habit of checking the status of the build a few
>>> hours after a commit.  But a generated email would compensate for my, um,
>>> it's on the tip of my tongue, um, my, my memory!  And an automated push
>>> would allow me to resume walking in front of buses.
>>>
>>
>>> The bintray deployment should not be taken as a source of stable builds.
>>> It is meant to be used by what Eliot calls brave souls who want to help to
>>> test the latest and possibly unstable changes.
>>>
>>> Good.  This makes perfect sense to me.  Are there places in the
>>> configuration to add brief overview texts explaining this to the bintray
>>> download pages?  It would be great to have a short paragraph that says
>>> these are development versions and directs to the master builds.
>>>
>>
>>> P.S. for master builds Gilad has noticed that there is no .msi for the
>>> newspeak builds (and I suspect there may be no .dmg).  In e.g.
>>> build.win32x86/newspeak.cog.spur/installer is code to make the .msi for
>>> a newspeak vm.  And the corresponding thing exists for making the Mac OS
>>> .dmg.  Any brace souls feel up to trying to get them together be built?
>>>
>>> Just my 2c
>>>
>>> Tim
>>>
>>> On Thu, 1 Jun 2017, 06:05 Ben Coman, <btc at openinworld.com> wrote:
>>>
>>>> On Thu, Jun 1, 2017 at 2:27 AM, Nicolas Cellier <
>>>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2017-05-31 17:31 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>>
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> > On May 31, 2017, at 1:54 AM, K K Subbu <kksubbu.ml at gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > On Wednesday 31 May 2017 12:35 PM, Esteban Lorenzano wrote:
>>>>>> >>>> On 31 May 2017, at 09:01, K K Subbu <kksubbu.ml at gmail.com>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>> On Wednesday 31 May 2017 12:18 PM, Esteban Lorenzano wrote:
>>>>>> >>>> 1) We need a stable branch, let’s say is Cog 2) We also need a
>>>>>> >>>> development branch, let’s call it CogDev
>>>>>> >>> IMHO, three branches are required as a minimum - stable,
>>>>>> >>> integration and development because there are multiple primary
>>>>>> >>> developers in core Pharo working on different OS platforms.
>>>>>> >> but nobody will do the integration step so let’s keep it simple:
>>>>>> >> integration is made responsibly for anyone who contributes, as it
>>>>>> is
>>>>>> >> done now.
>>>>>> >
>>>>>> > I proposed only three *branches*, not people. Splitting development
>>>>>> into two branches and builds will help in isolating faster (separation of
>>>>>> concerns). If all issues get cleared in dev branch itself, then integration
>>>>>> branch will still be useful in catching regressions.
>>>>>>
>>>>>> I don't believe this.  Since the chain is VMMaker.oscog =>
>>>>>> opensmalltalk/vm => CI, clumping commits together when pushing from, say,
>>>>>> CogDev to Cog doesn't help in identifying where things broke in VMMaker.
>>>>>> This is why Esteban has implemented a complete autobuild path run on each
>>>>>> VMMaker.oscog commit.
>>>>>>
>>>>>> But, while this is a good thing, it isn't adequate because
>>>>>> a) important changes are made to opensmalltalk/vm code independent of
>>>>>> VMMaker.oscog
>>>>>> b) sometimes one /has/ to break things to have them properly tested
>>>>>> (e.g. the new compactor).  i.e. there has to be a way of getting some
>>>>>> experimental half-baked thing through the build pipeline so brace souls can
>>>>>> test them
>>>>>>
>>>>>> > I will defer to your experience. I do understand the difference
>>>>>> between logical and practical in these matters.
>>>>>>
>>>>>> Let's take a step back and instead of discussing implementation,
>>>>>> discuss design.
>>>>>>
>>>>>> For me, a VM is good not when someone says it is, not when it builds
>>>>>> on all platforms, but when extensive testing finds no faults in it.  For me
>>>>>> this implies tagging versions in opensmalltalk/vm (which by design index
>>>>>> the corresponding VMMaker.oscog because generated source is stamped with
>>>>>> VMMaker.oscog version info) rather than using branches.
>>>>>>
>>>>>> Further, novel bugs are found in VMs that are considered good, and
>>>>>> these bugs should, if possible, be added to a test suite.  This points to a
>>>>>> major deficiency in our ability to tests VMs.  We have no way to test the
>>>>>> UI automatically.  We have to use humans to produce mouse clicks and
>>>>>> keystrokes.  For me this implies tagging releases, and the ability to state
>>>>>> that a given VM supersedes a previous known good VM.
>>>>>
>>>>> I just want to interject a check everyone's understanding of git
>> branches.  Although I haven't used SVN, what I've read indicates SVN
>> concepts can be hard to shake and git branching is conceptually very
>> different from SVN.
>>
>> The key thing is "a branch in Git is simply a lightweight movable pointer
>> to a commit."
>> The following article seems particularly useful to help frame our
>> discussion...
>> https://git-scm.com/book/en/v1/Git-Branching-What-a-Branch-Is
>>
>> A branch is much the same as a tag, the are both references to a
>> particular commit, except
>> * branches are mutable references
>> * tags are immutable references
>> http://alblue.bandlem.com/2011/04/git-tip-of-week-tags.html
>>
>> So if you want a moveable "good-vm" tag, maybe what you need is a branch
>> reference.
>>
>>
>>>>>> And the previous paragraph applies equally to performance
>>>>>> improvements, and functionality enhancements, not just bugs.
>>>>>>
>>>>>> Test suites and build chains catch regressions.  Regressions in
>>>>>> functionality and in performance are _useful information_ for developers
>>>>>> trying to improve things, not necessarily an evil to be avoided at all
>>>>>> costs.
>>>>>
>>>>> Agreed. But you want to deal with your own regressions not other
>> peoples.  It seems harder to apply a "if you break it, you fix it"
>> philosophy if its always broken.
>>
>>
>>> The system must allow pushing an experiment through the build and test
>>>>>> pipeline to learn of a piece of development's impact.
>>>>>
>>>>> IIUC, experimental branches (and PRs!!) can run through the CI
>> pipeline identical to the Cog branch (except maybe deployment step).  There
>> seems no benefit here for needing to commit directly to the Cog branch when
>> a PR-commit would work the same.
>>
>>
>>> An experiment may have to last for several months (for several reasons;
>>>>>> the new compactor is a good example: some bugs show up in unusual
>>>>>> circumstances; some bugs are hard to fix).
>>>>>>
>>>>>> Another requirement is to provide a stable point for someone to begin
>>>>>> new work.  They need to know that their starting point is not an experiment
>>>>>> in progress. They need to understand that the cost of working on what is
>>>>>> effectively a branch from the trunk is an integration step(s) into trunk
>>>>>> layer on, and this can't be just at the opensmalltalk/vm level using fit to
>>>>>> assist the merge, but also at the VMMaker.oscog level using Monticello to
>>>>>> merge.  Both are good at supporting merges because both support identifying
>>>>>> the set of changes.  Both are poor at supporting merges because they don't
>>>>>> understand refactoring and currently only humans can massage a set of
>>>>>> changes forwards applying refactorings to a set of changes.  This is what
>>>>>> real merges are, and the reason why git only eases the trivial cases and
>>>>>> why real programmers use a lot more tools to merge than just a vcs.
>>>>>>
>>>>>> Can others add additional requirements, or critique the above
>>>>>> requirements?  (Try not to mention git or ci implementations when you do).
>>>>>> ======
>>>>>>
>>>>>> With the above said what seems lacking to me is the testing framework
>>>>>> for completed VMs.  A build not can identify commits that fail a build and
>>>>>> also produce a VM for subsequent packaging and/or testing.  Separating the
>>>>>> steps is very useful here.  A long pipeline with a single red or green
>>>>>> light at the end is much less useful than a series of short pipelines, each
>>>>>> with a separate red or green light.  Reading through a bot log to identify
>>>>>> precisely where things broke is both tedious and, more importantly, not
>>>>>> useful in an automated system because that identification is manual.
>>>>>> Separate short pipelines can be used to inform an automatic system (right
>>>>>> Bob? Bob Westergaard built the testing system at Cadence and that is
>>>>>> constructed from lots of small steps and it isolates faults nicely;
>>>>>> something that an end-to-end system like Esteban's doesn't do as well).
>>>>>>
>>>>>> Now, if we have a long sequence of nicely separated generate, build,
>>>>>> package, test steps how many separate pipelines do we need to be able to
>>>>>> collaborate?  Is it enough to be able to tag an upstream artifact as having
>>>>>> passed some or all of its downstream tests or do we need to be able to
>>>>>> duplicate the pipeline so people can run independent experiments?
>>>>>>
>>>>>> For me, I see two modes of development; new development and
>>>>>> maintenance.  New development is fine in a fork in some subset of the full
>>>>>> build chain.  e.g. when working on Spur I forked within VMMaker.oscog (and,
>>>>>> unfortunately, in part because we didn't have opensmalltalk/vm or many of
>>>>>> the above requirements discussed, let alone met, I would break V3 for much
>>>>>> of the time). e.g. the new compactor was forked in VMMaker.oscog without
>>>>>> breaking Esteban's chain by my using a special generation step controlled
>>>>>> by a switch I set in my branch.  I tested in my own sandbox until the new
>>>>>> compactor needed testing by a wider audience.
>>>>>>
>>>>>> Maintenance is some relatively quick fix one (thinks one) can safely
>>>>>> apply to either VMMaker.oscog or opensmalltalk/vm trunk to address some
>>>>>> issue.
>>>>>>
>>>>>> Forking is fine for new development if
>>>>>> a) people understand and are prepared to pay the cost of merging, or,
>>>>>> better,
>>>>>> b) they can use switches to include their work as optional in trunk
>>>>>> There are lots of switches:
>>>>>> A switch between versions in VMMaker.oscog, e.g. Spur memory manager
>>>>>> vs V3, or the new Spur compactor vs the old, or the Sista JIT vs the
>>>>>> standard, etc
>>>>>> A switch between a vm configuration, e.g. pharo.cog.spur vs
>>>>>> squeak.cog.spur in a build directory, which can do any of
>>>>>> - select a generated source tree (e.g. spursrc vs spur64src)
>>>>>> - use #ifdef's to select code in the C source
>>>>>> - use plugins.int & plugins.ext to select a set of plugins
>>>>>> A switch between dialects (Pharo vs Squeak vs Newspeak)
>>>>>> A switch between platforms (Mac OS X vs win32, Linux x64 vs Linux ARM)
>>>>>>
>>>>>>
>>>>>> I get the above distinctions and know how to navigate amongst them
>>>>>> upstream, but don't understand very well the downstream (how to clone the
>>>>>> build/test CI pipeline so I can cheaply fork, work on the branch and then
>>>>>> merge). So I'm happier using switches to try and hide new work in trunk to
>>>>>> avoid derailing people.  And so I prefer the notion of a single pipeline
>>>>>> that tags specific versions as good.
>>>>>>
>>>>>> Is one of the requirements that people want to clearly separate
>>>>>> maintenance from new development?
>>>>>>
>>>>>
>> This diagram may be a good reference for discussion, of how maintenance
>> hotfixes can relate to development branches.
>> http://1.bp.blogspot.com/-ct9MmWf5gJk/U2Pe9V8A5GI/
>> AAAAAAAAAT0/0Y-XvAb9RB8/s1600/gitflow-orig-diagram.png
>>
>> cheers -ben
>>
>>>
>>>>>> Is one of the requirements that people want to clearly identify which
>>>>>> commit caused a specific bug? (Big discussion here about major, e.g. V3 =>
>>>>>> Spur transitions vs small grain changes; you can identify the latter, but
>>>>>> not necessarily the former).
>>>>>
>>>>> I suppose what I'm asking is what's the benefit of an all green
>>>>>> build?  For me a tested, version and named artefact is more useful than an
>>>>>> all green build.  An all red build is a red flag.  A mostly green build is
>>>>>> simply a failure to segregate production from in development artefacts.
>>>>>>
>>>>>>
>>>>>>
>>>>> Hi Eliot,
>>>>> the main advantage of github is the social thing:
>>>>> - lower barrier of contributing via a better integration of tools
>>>>>  (not only vcs, but issue tracker, wiki, continuous integration, code
>>>>> review/comments and pull request - even if we under use most of these
>>>>> tools),
>>>>> - and ease integration of many small contributions back.
>>>>> For this to work well, such work MUST happen in separate branches.
>>>>> in this context, there is an obvious benefit of green build: quickly
>>>>> estimate if we can merge a pull request or not.
>>>>> when red, we have no information about possible regressions, and have
>>>>> to go through the tedious part: go down into the console log of both
>>>>> builds, try to understand and compare... There is already enough work
>>>>> involved in reviewing source code.
>>>>>
>>>> I see that my opening argument was simplistic.  However Nicolas' point
>>>> above is probably more significant.
>>>> If we want to encourage new contributors, we need:
>>>>
>>>> * to show that the CI builds are cared for
>>>>
>>>> * allow newcomers to be confident that the tip they are working from is
>>>> green before they start.  When they submit their PR and the CI tests fail,
>>>> they should be able to zero in the failures *they* caused and *as*a*newbie*
>>>> not have to sort through the confounding factors from other's failures.
>>>>
>>>> * act timely to integrate, to encourage further contributions.  If
>>>> someone contributes a good fix, a green CI test may make you inclined to
>>>> quickly review and integrate. But when the CI shows failure, how will you
>>>> feel about looking into it? Further, when the mainline returns to green,
>>>> the existing PRs don't automatically retest, and no-one seems to be
>>>> manually managing them, so such PRs seem to end up in limbo which is
>>>> *really* discouraging for potential contributors.
>>>>
>>>> cheers -ben
>>>>
>>>>
>>>>> I tend to agree on your view for mid/long term changes:
>>>>> Say a developper A works on new garbage collector, developper B on
>>>>> 64bits compatibility, developer C on lowcode extension and developer D on
>>>>> sista (though maybe there is a single developper touching 3 of these)
>>>>> Since each of these devs are going to take months, and touch many core
>>>>> methods scattered in interpreter/jit/object memory or CCodeGenerator, then
>>>>> it's going to be very difficult to merge (way too many conflicts).
>>>>>
>>>>> If on different branches, there is the option to rebase or merge with
>>>>> other branches. But it doesn't scale with N branches touching same core
>>>>> methods: N developpers would have to rebase on N-1 concurrent branches,
>>>>> resolve the exact same conflicts etc... Obviously, concurrent work would
>>>>> have to be integrated back ASAP in a master branch.
>>>>>
>>>>> So, a good branch is a short branch, if possible covering a minimal
>>>>> feature set.
>>>>> And long devs you describe must not be handled by branches, but by
>>>>> switches.
>>>>> This gives you a chance to inspect the impact of your own refactoring
>>>>> on your coworkers.
>>>>>
>>>>> In this model, yes, you have a license to break your own artifact (say
>>>>> generationalScavenger, win64, lowcode, sista).
>>>>> But you must be informed if ever you broke the production VM, and/or
>>>>> concurrent artifacts. You have to maintain a minimal set of features
>>>>> working, otherwise you prevent others to work. In the scavenger case, you
>>>>> used a branch for a short period, and that worked quite well.
>>>>>
>>>>> In this context, I agree, a single green light is not enough.
>>>>> We need a sort of status board tracing the regressions individually.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> > Regards .. Subbu
>>>>>>
>>>>>
>>>>>
>>>>>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170601/85bfdcad/attachment-0001.html>