On Mon, Nov 18, 2013 at 01:26:26PM +0000, Frank Shearar wrote:
On 18 November 2013 12:38, David T. Lewis lewis@mail.msen.com wrote:
On Mon, Nov 18, 2013 at 10:02:17AM +0000, Frank Shearar wrote:
On 17 November 2013 00:32, David T. Lewis lewis@mail.msen.com wrote:
On Sat, Nov 16, 2013 at 05:10:18PM -0600, Ken Causey wrote:
The disk space on box3 is beginning to get dangerously low on filesystem space. At the moment it is 94% full with 3.8GB of free space. Lately it seems to increase 1% every 2 or 3 days.
Primary offenders seem to be
/var/lib/jenkins/ 33GB
/home/ssdotcom/ 18GB
I hope one or both of you can find something to delete.
Most of the variation in disk usage is related to our Jenkins jobs. This is to be expected, but it does mean that we will need to tend to the garden and make sure that weeds do not take over. I have two suggestions:
- Every Jenkins job has a description that is set up when we configure
the job. The description should (of course) explain the purpose of the job, but it should also have some sort of tag line to identify the person who is responsible for maintaining that job. For example, the description for the InterpreterVM job includes this:
"This Jenkins project is maintained by Dave Lewis (lewis@mail.msen.com)"
- All of the jobs consume a fair amount of disk space, and it is pretty
easy to let this get out of control. I think this can usually be managed in the Jenkins project configurations, so so need to keep an eye on the high-usage jobs an fix up their settings accordingly.
Here is the current disk utilization for our Jenkins jobs:
jenkins@box3-squeak:~/workspace$ du -s * 204660 CogVM 260320 ExternalPackage-AndreasSystemProfiler 262948 ExternalPackage-Control 286952 ExternalPackage-FFI 392016 ExternalPackage-FileSystem 456600 ExternalPackage-Fuel 427560 ExternalPackage-Magma 256144 ExternalPackage-Nebraska 256184 ExternalPackage-Nutcracker 263092 ExternalPackage-OSProcess 228668 ExternalPackage-Phexample 256732 ExternalPackage-Quaternion 255692 ExternalPackage-RoelTyper 417300 ExternalPackages 414608 ExternalPackages-Metacello 255580 ExternalPackage-SqueakCheck 412592 ExternalPackages-Squeak4.3 338264 ExternalPackages-Squeak4.4 256288 ExternalPackage-Universes 255752 ExternalPackage-WebClient 256268 ExternalPackage-XML-Parser 444996 ExternalPackage-Xtreams 387160 ExternalPackage-Xtreams-FileSystem 256144 ExternalPackage-Zippers 384448 InterpreterVM 205396 LatestReleasedVM 5370500 ReleaseSqueakTrunk 127408 Squeak 64-bit image 766392 SqueakTrunk 291772 SqueakTrunkOnBleedingEdgeCog 412972 SqueakTrunkOnInterpreter 419892 SqueakTrunkPerformance
At the moment, the ReleaseSqueakTrunk job is using a lot of space, and this is mostly due to saved images in its ./target directory. Most likely we can purge out some of the older images to free up some space.
With the exception of ReleaseSqueakTrunk, the CI jobs _should_ be fairly careful with disk space. One way they'd leak disk space is through target/package-cache, which will tend to accumulate MCZs over time. The SqueakTrunk* and ExternalPackage* jobs are set up to work from a blank slate, so they should always be in a position to have their workspaces wiped.
The issue is that we are running out of disk space and need to take action.
In the project configuration for ReleaseSqueakTrunk, the following option is *not* selected:
Discard all but the last successful/stable artifact to save disk space
Should this be changed?
I don't know. The job's set up to consider _all_ Squeak-*-*.zip files as part of the artifact, which is itself a problem. I suspect that if this switch is flagged, you won't be able to go to, say, ReleaseSqueakTrunk 22, and get the artifact. I _think_.
The /var/lib/jenkins/workspace/ReleaseSqueakTrunk/target directory is using the majority of the disk space for this job, and it looks to me like most of this consists of transient files that could be deleted (or compressed).
What are the transient files? The only things possibly worth keeping are Squeak-*-*.zip files. The rest of the files should be either interim build steps (TrunkImage.* and so on), XML files for test/performance coverage, and Squeak cruft like update logs, package-cache and so on. All of those can go.
I don't want to touch anything here without your approval.
You have my permission to wipe any of "my" workspaces at any time: if they break because of a wipe, I didn't do my job properly! :)
OK, if we actually get dangerously close to running out of disk space, one of us can delete /var/lib/jenkins/workspace/ReleaseSqueakTrunk/target to relieve the problem.
But we need to deal with this in a sustainable way that does not require manual intervention. In the case of ReleaseSqueakTrunk, possibly this can be done by adding one more build step to the job that cleans up files when the job is complete.
Right now you have these two build steps: bundle install DEBUG=1 bundle exec rake release
I don't know anything about ruby or rake but I am guessing that you might be able to add one more build step that might look something like this:
bundle exec rake cleanup
Or maybe it can be just a shell command that cleans up unneeded files in the workspace/ReleaseSqueakTrunk/targets directory. I see that there is a Cog VM executable in that directory, so I'm assuming that you do not want it completely wiped out. But perhaps the rest of the files can be purged as part of the job steps.
I'll be happy to help with a shell command if you like, but I'm not familiar enough with rake and ruby to know if that is the right thing to do in this case. If there is some way that you can do this in the ruby scripts, that would probably be a better approach.
Thanks :-) Dave