[Box-Admins] Re: Problems with Build.squeak.org

Frank Shearar frank.shearar at gmail.com
Sun Apr 28 12:39:27 UTC 2013


On 26 April 2013 09:38, Frank Shearar <frank.shearar at gmail.com> wrote:
> On 25 April 2013 22:32, Ken Causey <ken at kencausey.com> wrote:
>>> -------- Original Message --------
>>> Subject: Re: Problems with Build.squeak.org
>>> From: Frank Shearar <frank.shearar at gmail.com>
>>> Date: Thu, April 25, 2013 4:18 pm
>>> To: Ken Causey <ken at kencausey.com>
>>> Cc: Squeak Hosting Support <box-admins at lists.squeakfoundation.org>
>>>
>>>
>>> On 24 April 2013 13:31, Ken Causey <ken at kencausey.com> wrote:
>>> > On 04/24/2013 01:28 AM, Frank Shearar wrote:
>>> >>
>>> >> On 23 April 2013 23:56, Ken Causey<ken at kencausey.com>  wrote:
>>> >>>
>>> >>> I may be jumping the gun here, but I'm afraid the issue with box3 is on
>>> >>> its
>>> >>> way to repeating again.
>>> >>>
>>> >>> It seems to me that the ExternalPackages-* tasks are completely fouled
>>> >>> up.
>>> >>> I'm almost certain they take hours wheres as the listed durations are on
>>> >>> the
>>> >>> order of half and hour.  More importantly the site currently claims
>>> >>> Jenkins
>>> >>> is idle yet
>>> >>
>>> >>
>>> >> They are indeed the culprits. I think the root of the problem is twofold:
>>> >> * build.squeak.org is different in some way from my own test machines,
>>> >> where these jobs all complete
>>> >> * I'm shelling out from the Rake build, and these child processes
>>> >> aren't killed by the Rake process.
>>> >>
>>> >> I think for now I should disable them.
>>> >>
>>> >> frank
>>> >
>>> >
>>> > Thanks Frank.  If there is anything I can do to help you diagnose this
>>> > please let me know.
>>>
>>> I've re-enabled the ExternalPackages job (leaving ExternalPackages-4.3
>>> and ExternalPackages-4.4 disabled). Instead of just spawning a process
>>> and failing the test after a timeout, we now spawn the process and get
>>> the pid of that process. Then we schedule a future to kill the job
>>> after the maximum test run time, and wait for the process to exit.
>>>
>>> I've kicked off a build, so fingers crossed we _won't_ see those
>>> hanging processes.
>>>
>>> frank
>>
>> Cool.  I'll plan to check it out a few times tonight while you are
>> presumably sleeping.
>
> So the build timed out after running the Nutcracker tests:
> http://build.squeak.org/job/ExternalPackages/95/console
>
> Which while sad is also an opportune time to look for lurking processes.
>
> But just in case it's not so much a broken build as a long-running
> build, I've upped the kill time to 1 hour. The reason these tests take
> so long compared to the in-image tests (there're about the same number
> of tests overall in each set) is that in the ExternalPackages suite we
> load all the various external packages into a copy of a trunk image,
> and then we run that image against each of Cog, CogMT and Interpreter!

So it was just a case of not enough time. The builds take around 40
minutes to run, which isn't bad considering it's 3k tests run three
times.

But more importantly, how does it look from an unkilled process
standpoint? I saw some messages in Jenkins talking about leaking file
descriptors, so I reckon we probably still have a bit of a problem?

frank

> frank
>
>> Ken


More information about the Box-Admins mailing list