[Vm-dev] BFS and CFS and Cogs, Oh My

Eliot Miranda eliot.miranda at gmail.com
Fri Apr 19 16:23:29 UTC 2013

Hi Alex,

On Fri, Apr 19, 2013 at 6:16 AM, Alex Bradbury <asb at asbradbury.org> wrote:

> On 19 April 2013 08:38, Casey Ransberger <casey.obrien.r at gmail.com> wrote:
> >
> > I had a brief chat with Con Kolivas, who did BFS (which implements
> kernel stuff that will make Cog happier under Linux on machines with
> sub-supercomputing quantities of CPUs) tonight.
> >
> > It sounds like there are actually two reasons it hasn't made it into the
> mainline kernel:
> >
> > a) he doesn't have time to support it, and
> > b) the other kernel folks don't want it.
> >
> > Oh well. Since right now I'm focused on Raspbian, I sent a message
> explaining what it was, why I want it, etc on their web board. If I do get
> it in, support would have to fall to me. Yikes, right? ;)
> Yes, for political reasons it seems unlikely anything like BFS would
> get in to the upstream kernel. If someone can do work to actually show
> noticeable performance gains then that would make us (the Raspberry Pi
> Foundation) interested in exploring further. Real workloads that
> perform much better with an alternative scheduler would be much more
> interesting than microbenchmarks.

This isn't about workload or performance.  It is about basic functionality.
 The CFS scheduler does not support multiple thread priorities for user
processes (actually, for the non-real-time scheduling policy, and the
real-time scheduling policy is available only to superuser processes).
 AFAIA it is the only main-stream pthreads scheduler that doesn't.  AFAIA
BFS (what a name?!) does support multiple thread priorities for user

Within the Squeak Cog VM (and in a number of other VMs, SMalltalk and Java
VMs amongst them) there's a heartbeat which is used to cause the VM to
periodically break out of normal processing and poll for events.  A
heartbeat is both much more efficient, and more regular than e.g.
decrementing a counter as part of normal processing (e.g. frame build on
entering non-leaf methods).  Ideally the heartbeat is implemented as a
thread spinning, blocking in e.g. nanosleep and then forcing the breakout
before entering nanosleep again.  But this requires that the heartbeat
thread runs at a higher priority than the main VM thread(s).  On linux
under the CFS this isn't possible.  The fallback is to use an interval
timer (setitimer with ITIMER_REAL) and a signal handler (for SIGALRM).
 This is a poor substitute:
- system calls are interrupted, which can play havoc with external code
- when debugging the heartbeat signal must be disabled because otherwise
one is constantly stepping into the signal handler
- certain linux kernels have bugs with signal delivery and threads which
can cause the loss of a thread's context, ending up with two threads having
the same context, hence the setitimer approach works only with a strictly
single-threaded VM (this is a bug I found and worked around late last year
in Red Hat Enterprise Linux WS release 4 (Nahant Update 4) vintage kernels,
which alas I have customers using)

Either of these solutions would seem straight-forward from the outside:
- make SCHED_RR and/or SCHED_FIFO for user processes.
- implement multiple priorities for SCHED_OTHER
Expecting to be able to install a VM as a setuid program is not realistic.

I think you'll find that this kind of architectural issue is present in a
number of multi-media applications, not just dynamic language virtual
machines.  The restriction to a single thread priority is, frankly,
pathetic.  If you see Rasbian and Pi as a platform for multi-media apps
then I would urge you to bring any influence you have to bear on getting
the  linux kernel community to provide multiple thread priorities.  The
lack thereof is a significant limitation.

best regards,
Eliot Miranda

Of course the next step after that
> wouldn't be dumping the upstream scheduler and switching to BFS, but
> it would certainly justify taking a closer look.
> I'm not entirely sure why you want to fork BFS - as far as I can see
> Con Kolivas is keeping the BFS and his larger -ck patchset up to date
> with upstream releases.
> In conclusion (from a Raspberry Pi perspective): please do play with
> BFS on the pi, do something useful with it (if it solves the recently
> discussed issues with heartbeat+cogvm then swell), then let's think
> about where to go from there.
> Regards,
> Alex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20130419/680c259b/attachment-0001.htm

More information about the Vm-dev mailing list