Re: [Box-Admins] Proposed week-long shutdown of Jenkins

20 Feb 2014

      On 20 February 2014 20:51, David T. Lewis lewis@mail.msen.com wrote:
...
Ken, thanks for the explanation. I recognize all of those issues.
I do think it is more appropriate to use the Jenkins UI to turn off the
problematic jobs until the issues can be addressed, as opposed to shutting
down the whole Jenkins system.
Not really. Other than the InterpreterVM and CogVM jobs (I'm running
off memory here), all the other jobs use rake, and shell out to run
Squeak, and do fancy things to avoid hung builds. Now I'm absolutely
100% sure that the orphaned processes are my fault. Clearly I don't
understand the intricacies of shells and subshells and process
ownership. The problem is the disconnect between the sets of people
who know very well what the squeak-ci code does (me) and people who
understand the unix process model (and ownership in particular) (not
me).
...
Yes, some of our Jenkins jobs are wasting a lot of space. Yes, that is a
fixable problem. No I don't think that a bigger disk drive will fix it ;-)
I thought I'd take a look at ExternalPackages-Xtreams, one of the
bigger jobs at 511M.
By far the biggest part of the disk usage - 221M - is simply the
repository itself. This is because we store big fat blobs of binary
data (images) in the repository. Upgrading these is simply wasteful.
Maybe some serious git guru-ness might be applied to reduce this. I
think there are tricks to remove the presence and history of large
binaries, but I'd have to look it up.
The target/ directory takes up no less than 195M. It has three VMs
(like most of these jobs): each Cog VM directory takes 14M, while the
Interpreter VM takes up 38M. (This is the source: because every job
can run on any agent, and that agent could have any manner of glibc,
we _build_ an Interpreter VM and memoise the artifact).
target/package-cache/ takes up 37M, presumably because jobs update the
trunk image from the base CI, save that, then load the package under
test (Xtreams in this case).
I've started the process of making jobs depend on the binary artifacts
of other jobs, which will probably remove the package-cache disk
usage. So SqueakTrunk will produce a TrunkImage.image that
ReleaseSqueakTrunk will take and produce a Squeak4.5.image, while
ExternalPackages-Xtreams will turn the TrunkImage.image into a JUnit
test result.
Saving 38M per job means saving 38*33 ~= 1G on disk.
frank
...
Dave
...
Let me just list the issues I'm aware of, not that these can all be
fixed in the same way or require any significant overall downtime.

Jenkins broke some of our build processes with a release months ago.

Since that time we have been pinned to a specific release and have not
updated.  Initially the plan was to be agile and keep up to date with
Jenkins releases, but no one has found the time to figure out why the
builds broke or at least the proper way to address the problem.  I know
Frank tried but he has only so much time and other fish to fry.  I
approached Chris C as he was the original instigator for Jenkins to try
to see if he had the interest to help Frank out.

The issue I have harped on about in the past about filling up the

filesystem on box3.  I'm convinced that Jenkins jobs are wasting space
somewhere or that maybe there are some jobs that can be deleted?  I'm
just speculating, but there are a number of jobs that have not succeeded
in months.  By the way growth has been generally slow of late but we are
at 97%, no immediate fear but 'vigilance!'.  If ultimately
build.squeak.org is as big as it is because it has to be, then we
probably need to approach SFC and see if there is budget to upgrade the
disk space on box3.  That's not my first choice however.

The issue that Chris has referred to which is that we still get jobs

stuck fairly regularly that have to be killed manually.
Ken
On 02/20/2014 11:11 AM, David T. Lewis wrote:
...
What problem are we trying to solve here?
If there are Jenkins jobs that cause problems, and if those problems
cannot be addressed right away, then the appropriate thing to do is
disable them using the normal Jenkins console. If an explanation is
needed, just update the job description to say what is going on.
A little bit of updating of the Jenkins job descriptions would do no
harm
in any case. Sort of like a class comment: "I am a Jenkins job that
tests
the FreebleBaz package. If I stop working, please contact
bilbo@baggins.org".
:)
Dave
...
Ken and I have been thinking of shutting down Jenkins (OK, it was my
idea)
for a week after 4.5 is released. The aim is to address hanging issues.
A week is a long time from a technical point of view, but it allows
people
using it to take a break. Mainly we're thinking of Frank here.
We're thinking of upgrades, disk usage, necessary and un-necessary
builds
(if there are any). Basically stopping that world for a week.
What do you think, Frank? If you are opposed, then we'll chuck this
idea.
Chris

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Box-Admins] Proposed week-long shutdown of Jenkins