Hi Alex,This isn't entirely true. Out-of-the-box unprivileged processes can't change the scheduling policy, but in kernels after 2.6.12 it is possible to configure your system to allow this without resorting to setuid root.
On Fri, Apr 19, 2013 at 6:16 AM, Alex Bradbury <asb@asbradbury.org> wrote:
On 19 April 2013 08:38, Casey Ransberger <casey.obrien.r@gmail.com> wrote:
>
> I had a brief chat with Con Kolivas, who did BFS (which implements kernel stuff that will make Cog happier under Linux on machines with sub-supercomputing quantities of CPUs) tonight.
>
> It sounds like there are actually two reasons it hasn't made it into the mainline kernel:
>
> a) he doesn't have time to support it, and
> b) the other kernel folks don't want it.
>
> Oh well. Since right now I'm focused on Raspbian, I sent a message explaining what it was, why I want it, etc on their web board. If I do get it in, support would have to fall to me. Yikes, right? ;)
Yes, for political reasons it seems unlikely anything like BFS would
get in to the upstream kernel. If someone can do work to actually show
noticeable performance gains then that would make us (the Raspberry Pi
Foundation) interested in exploring further. Real workloads that
perform much better with an alternative scheduler would be much more
interesting than microbenchmarks.
This isn't about workload or performance. It is about basic functionality. The CFS scheduler does not support multiple thread priorities for user processes (actually, for the non-real-time scheduling policy, and the real-time scheduling policy is available only to superuser processes).
* hard rtprio 1or you can add a squeakvm.conf file to /etc/security/limits.d with those same lines, eg.
* soft rtprio 1
# /etc/security/limits.d/squeakvm.confThis grants this capability to unprivileged users, but you will need to logout and login again for it to take effect, as pam limits are applied at user login.
* hard rtprio 1
* soft rtprio 1
# /etc/security/limits.d/squeakvm.confThis will grant the ability only to users in the squeakvm group. The 1 in the examples above is the maximum priority. Higher levels could be used, but a level of 1 is necessary to trigger the capability.
@squeakvm hard rtprio 1
@squeakvm soft rtprio 1
sudo groupadd squeakvmThere's a handy test program on the pthread_setschedparam man page - http://linux.die.net/man/3/pthread_setschedparam - that can be used for experimentation. I've attached the source. I tried this out on an up-to-date Ubuntu Server 12.04 LTS VM running on a MacbookPro under VMWare Fusion. YMMV.
gcc pthreads_sched_test.c -o schedtest -lpthreadThe first set of tests were performed without making any changes to the PAM limits.
./schedtestTrying to change the policy and priority of the new thread the program creates gives the following
Scheduler settings of main thread
policy=SCHED_OTHER, priority=0
Scheduler settings in 'attr'
policy=SCHED_OTHER, priority=0
inheritsched is INHERIT
Scheduler attributes of new thread
policy=SCHED_OTHER, priority=0
./schedtest -ar1 -i eTrying to change the priority of the main thread gives
Scheduler settings of main thread
policy=SCHED_OTHER, priority=0
Scheduler settings in 'attr'
policy=SCHED_RR, priority=1
inheritsched is EXPLICIT
pthread_create: Operation not permitted
./schedtest -mr1As Eliot described, the default configuration prevents unprivileged user processes from changing the priority or scheduling policy.
pthread_setschedparam: Operation not permitted
schedtest -ar1 -i eDoes this give sufficient flexibility without having to patch the kernel's scheduler (whatever its name)?
Scheduler settings of main thread
policy=SCHED_OTHER, priority=0
Scheduler settings in 'attr'
policy=SCHED_RR, priority=1
inheritsched is EXPLICIT
Scheduler attributes of new thread
policy=SCHED_RR, priority=1
schedtest -mr1 -ao0 -i e
Scheduler settings of main thread
policy=SCHED_RR, priority=1
Scheduler settings in 'attr'
policy=SCHED_OTHER, priority=0
inheritsched is EXPLICIT
Scheduler attributes of new thread
policy=SCHED_OTHER, priority=0
AFAIA it is the only main-stream pthreads scheduler that doesn't. AFAIA BFS (what a name?!) does support multiple thread priorities for user processes.
Within the Squeak Cog VM (and in a number of other VMs, SMalltalk and Java VMs amongst them) there's a heartbeat which is used to cause the VM to periodically break out of normal processing and poll for events. A heartbeat is both much more efficient, and more regular than e.g. decrementing a counter as part of normal processing (e.g. frame build on entering non-leaf methods). Ideally the heartbeat is implemented as a thread spinning, blocking in e.g. nanosleep and then forcing the breakout before entering nanosleep again. But this requires that the heartbeat thread runs at a higher priority than the main VM thread(s). On linux under the CFS this isn't possible. The fallback is to use an interval timer (setitimer with ITIMER_REAL) and a signal handler (for SIGALRM). This is a poor substitute:- system calls are interrupted, which can play havoc with external code- when debugging the heartbeat signal must be disabled because otherwise one is constantly stepping into the signal handler- certain linux kernels have bugs with signal delivery and threads which can cause the loss of a thread's context, ending up with two threads having the same context, hence the setitimer approach works only with a strictly single-threaded VM (this is a bug I found and worked around late last year in Red Hat Enterprise Linux WS release 4 (Nahant Update 4) vintage kernels, which alas I have customers using)
Either of these solutions would seem straight-forward from the outside:- make SCHED_RR and/or SCHED_FIFO for user processes.- implement multiple priorities for SCHED_OTHERExpecting to be able to install a VM as a setuid program is not realistic.
I think you'll find that this kind of architectural issue is present in a number of multi-media applications, not just dynamic language virtual machines. The restriction to a single thread priority is, frankly, pathetic. If you see Rasbian and Pi as a platform for multi-media apps then I would urge you to bring any influence you have to bear on getting the linux kernel community to provide multiple thread priorities. The lack thereof is a significant limitation.best regards,Eliot Miranda
Of course the next step after that
wouldn't be dumping the upstream scheduler and switching to BFS, but
it would certainly justify taking a closer look.
I'm not entirely sure why you want to fork BFS - as far as I can see
Con Kolivas is keeping the BFS and his larger -ck patchset up to date
with upstream releases.
In conclusion (from a Raspberry Pi perspective): please do play with
BFS on the pi, do something useful with it (if it solves the recently
discussed issues with heartbeat+cogvm then swell), then let's think
about where to go from there.
Regards,
Alex
--
best,Eliot
-- You can follow me on twitter at http://twitter.com/smalltalkhacker