On 20 February 2014 20:51, David T. Lewis lewis@mail.msen.com wrote:
Ken, thanks for the explanation. I recognize all of those issues.
I do think it is more appropriate to use the Jenkins UI to turn off the problematic jobs until the issues can be addressed, as opposed to shutting down the whole Jenkins system.
Not really. Other than the InterpreterVM and CogVM jobs (I'm running off memory here), all the other jobs use rake, and shell out to run Squeak, and do fancy things to avoid hung builds. Now I'm absolutely 100% sure that the orphaned processes are my fault. Clearly I don't understand the intricacies of shells and subshells and process ownership. The problem is the disconnect between the sets of people who know very well what the squeak-ci code does (me) and people who understand the unix process model (and ownership in particular) (not me).
Yes, some of our Jenkins jobs are wasting a lot of space. Yes, that is a fixable problem. No I don't think that a bigger disk drive will fix it ;-)
I thought I'd take a look at ExternalPackages-Xtreams, one of the bigger jobs at 511M.
By far the biggest part of the disk usage - 221M - is simply the repository itself. This is because we store big fat blobs of binary data (images) in the repository. Upgrading these is simply wasteful. Maybe some serious git guru-ness might be applied to reduce this. I think there are tricks to remove the presence and history of large binaries, but I'd have to look it up.
The target/ directory takes up no less than 195M. It has three VMs (like most of these jobs): each Cog VM directory takes 14M, while the Interpreter VM takes up 38M. (This is the source: because every job can run on any agent, and that agent could have any manner of glibc, we _build_ an Interpreter VM and memoise the artifact). target/package-cache/ takes up 37M, presumably because jobs update the trunk image from the base CI, save that, then load the package under test (Xtreams in this case).
I've started the process of making jobs depend on the binary artifacts of other jobs, which will probably remove the package-cache disk usage. So SqueakTrunk will produce a TrunkImage.image that ReleaseSqueakTrunk will take and produce a Squeak4.5.image, while ExternalPackages-Xtreams will turn the TrunkImage.image into a JUnit test result.
Saving 38M per job means saving 38*33 ~= 1G on disk.
frank
Dave
Let me just list the issues I'm aware of, not that these can all be fixed in the same way or require any significant overall downtime.
- Jenkins broke some of our build processes with a release months ago.
Since that time we have been pinned to a specific release and have not updated. Initially the plan was to be agile and keep up to date with Jenkins releases, but no one has found the time to figure out why the builds broke or at least the proper way to address the problem. I know Frank tried but he has only so much time and other fish to fry. I approached Chris C as he was the original instigator for Jenkins to try to see if he had the interest to help Frank out.
- The issue I have harped on about in the past about filling up the
filesystem on box3. I'm convinced that Jenkins jobs are wasting space somewhere or that maybe there are some jobs that can be deleted? I'm just speculating, but there are a number of jobs that have not succeeded in months. By the way growth has been generally slow of late but we are at 97%, no immediate fear but 'vigilance!'. If ultimately build.squeak.org is as big as it is because it has to be, then we probably need to approach SFC and see if there is budget to upgrade the disk space on box3. That's not my first choice however.
- The issue that Chris has referred to which is that we still get jobs
stuck fairly regularly that have to be killed manually.
Ken
On 02/20/2014 11:11 AM, David T. Lewis wrote:
What problem are we trying to solve here?
If there are Jenkins jobs that cause problems, and if those problems cannot be addressed right away, then the appropriate thing to do is disable them using the normal Jenkins console. If an explanation is needed, just update the job description to say what is going on.
A little bit of updating of the Jenkins job descriptions would do no harm in any case. Sort of like a class comment: "I am a Jenkins job that tests the FreebleBaz package. If I stop working, please contact bilbo@baggins.org".
:)
Dave
Ken and I have been thinking of shutting down Jenkins (OK, it was my idea) for a week after 4.5 is released. The aim is to address hanging issues.
A week is a long time from a technical point of view, but it allows people using it to take a break. Mainly we're thinking of Frank here. We're thinking of upgrades, disk usage, necessary and un-necessary builds (if there are any). Basically stopping that world for a week.
What do you think, Frank? If you are opposed, then we'll chuck this idea.
Chris