<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Elliott,<br>
Happy to help. Glad to hear it works on Redhat too. I only had
access to Ubuntu at the time I tried it. The only reason I suggested
the group-based version was to avoid any potential conflicts with
other config on the machine. As you say, the global version works
too, it just applies to all users, which may or may not be what one
wants.<br>
<br>
Cheers, Steve<br>
<br>
<div class="moz-cite-prefix">On 25/05/2013 00:13, Eliot Miranda
wrote:<br>
</div>
<blockquote
cite="mid:CAC20JE2GrwrxFftt3-JvqVtwB9=XZXqXUykWi30EKn5ubs_NGg@mail.gmail.com"
type="cite">
<pre wrap=""> </pre>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
Steve,
<div><br>
</div>
<div>Â Â thank you _so much_! Â This works like a charm. Â At
least on redhat I didn't have to add a group and thins work fine
with the first of the <span class="Apple-style-span"
style="font-family:monospace">/etc/security/limits.d/squeakvm.conf</span><font
class="Apple-style-span" face="arial, helvetica, sans-serif">
approaches.</font><br>
<br>
<div class="gmail_quote">On Sun, Apr 21, 2013 at 5:10 AM, Steve
Rees <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:squeak-vm-dev@vimes.worldonline.co.uk"
target="_blank">squeak-vm-dev@vimes.worldonline.co.uk</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Â <br>
<div text="#000000" bgcolor="#FFFFFF"> Hi Eliot,<br>
<br>
<div>On 19/04/2013 17:23, Eliot Miranda wrote:<br>
</div>
<blockquote type="cite">Hi Alex,<br>
<br>
<div class="gmail_quote">On Fri, Apr 19, 2013 at 6:16
AM, Alex Bradbury <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:asb@asbradbury.org" target="_blank">asb@asbradbury.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
On 19 April 2013 08:38, Casey Ransberger <<a
moz-do-not-send="true"
href="mailto:casey.obrien.r@gmail.com"
target="_blank">casey.obrien.r@gmail.com</a>>
wrote:<br>
><br>
> I had a brief chat with Con Kolivas, who did
BFS (which implements kernel stuff that will make
Cog happier under Linux on machines with
sub-supercomputing quantities of CPUs) tonight.<br>
><br>
> It sounds like there are actually two reasons
it hasn't made it into the mainline kernel:<br>
><br>
> a) he doesn't have time to support it, and<br>
> b) the other kernel folks don't want it.<br>
><br>
> Oh well. Since right now I'm focused on
Raspbian, I sent a message explaining what it was,
why I want it, etc on their web board. If I do get
it in, support would have to fall to me. Yikes,
right? ;)<br>
<br>
Yes, for political reasons it seems unlikely
anything like BFS would<br>
get in to the upstream kernel. If someone can do
work to actually show<br>
noticeable performance gains then that would make us
(the Raspberry Pi<br>
Foundation) interested in exploring further. Real
workloads that<br>
perform much better with an alternative scheduler
would be much more<br>
interesting than microbenchmarks. </blockquote>
<div><br>
</div>
<div>This isn't about workload or performance. It is
about basic functionality. The CFS scheduler does
not support multiple thread priorities for user
processes (actually, for the non-real-time
scheduling policy, and the real-time scheduling
policy is available only to superuser processes).Â
<br>
</div>
</div>
</blockquote>
This isn't entirely true. Out-of-the-box unprivileged
processes can't change the scheduling policy, but in
kernels after 2.6.12 it is possible to configure your
system to allow this without resorting to setuid root.<br>
<br>
Quoting from the man page for sched_setscheduler - <a
moz-do-not-send="true"
href="http://linux.die.net/man/2/sched_setscheduler"
target="_blank">http://linux.die.net/man/2/sched_setscheduler</a>
- (the privilege restrictions are the same as for
pthread_attr_setschedpolicy), "If an unprivileged process
has a nonzero RLIMIT_RTPRIO soft limit, then it can change
its scheduling policy and priority, subject to the
restriction that the priority cannot be set to a value
higher than the maximum of its current priority and its
RLIMIT_RTPRIO soft limit."<br>
<br>
Using the pam_limits.so module, one can set the
RLIMIT_RTPRIO soft limit higher than zero, which then
allows the use of the SCHED_FIFO and SCHED_RR policies
with priorities up to the soft limit.<br>
<br>
One way to achieve this is to add the following lines to
the file /etc/security/limits.conf.<br>
<blockquote><tt>*   hard   rtprio   1</tt><br>
<tt>*   soft   rtprio   1</tt><br>
</blockquote>
or you can add a squeakvm.conf file to
/etc/security/limits.d with those same lines, eg.<br>
<blockquote><tt># /etc/security/limits.d/squeakvm.conf</tt><br>
<tt>*   hard   rtprio   1</tt><br>
<tt>*   soft   rtprio   1</tt><br>
</blockquote>
This grants this capability to unprivileged users, but you
will need to logout and login again for it to take effect,
as pam limits are applied at user login.<br>
<br>
The only problem with this approach is that there's a
possibility it might conflict with other global settings
for the rtprio. Another alternative is to grant the
privilege to a group (eg. squeakvm) and then add users to
that group to allow the ability to change the SCHED_FIFO
or SCHED_RR policies and to change the priorities of
threads:<br>
<blockquote><tt># /etc/security/limits.d/squeakvm.conf</tt><br>
<tt>@squeakvm   hard   rtprio   1</tt><br>
<tt> <tt>@squeakvm</tt>   soft   rtprio   1</tt></blockquote>
This will grant the ability only to users in the squeakvm
group. The 1 in the examples above is the maximum
priority. Higher levels could be used, but a level of 1 is
necessary to trigger the capability.<br>
<br>
Of course the group needs to exist for this to take
effect.<br>
<blockquote><tt>sudo groupadd squeakvm</tt><br>
</blockquote>
There's a handy test program on the pthread_setschedparam
man page - <a moz-do-not-send="true"
href="http://linux.die.net/man/3/pthread_setschedparam"
target="_blank">http://linux.die.net/man/3/pthread_setschedparam</a>
- that can be used for experimentation. I've attached the
source. I tried this out on an up-to-date Ubuntu Server
12.04 LTS VM running on a MacbookPro under VMWare Fusion.
YMMV.<br>
<br>
pthreads_sched_test is a bit of a verbose name, so I named
the test program "schedtest" when I compiled it. Here are
the results of my tests.<br>
<br>
First, compile the program<br>
<blockquote><tt>gcc pthreads_sched_test.c -o schedtest
-lpthread</tt><br>
</blockquote>
The first set of tests were performed without making any
changes to the PAM limits.<br>
<br>
Running schedtest without arguments gives the following<br>
<blockquote><tt>./schedtest<br>
<br>
</tt>
<blockquote><tt>Scheduler settings of main thread</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
<br>
<tt>Scheduler settings in 'attr'</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
<tt>Â Â Â inheritsched is INHERIT</tt><br>
<br>
<tt>Scheduler attributes of new thread</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
</blockquote>
</blockquote>
Trying to change the policy and priority of the new thread
the program creates gives the following<br>
<blockquote><tt>./schedtest -ar1 -i e<br>
<br>
</tt>
<blockquote><tt>Scheduler settings of main thread</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
<br>
<tt>Scheduler settings in 'attr'</tt><br>
<tt>Â Â Â policy=SCHED_RR, priority=1</tt><br>
<tt>Â Â Â inheritsched is EXPLICIT</tt><br>
<br>
<tt>pthread_create: Operation not permitted</tt><br>
<tt> </tt></blockquote>
</blockquote>
Trying to change the priority of the main thread gives<br>
<blockquote><tt>./schedtest -mr1<br>
<br>
</tt>
<blockquote><tt>pthread_setschedparam: Operation not
permitted</tt><br>
</blockquote>
</blockquote>
As Eliot described, the default configuration prevents
unprivileged user processes from changing the priority or
scheduling policy.<br>
<br>
After adding the /etc/security/limits.d/squeakvm.conf file
describe above, adding my user to the squeakvm group and
logging out and back in again, the tests are somewhat more
successful. Note that these are the only additional
privileges given to the squeakvm group.<br>
<blockquote><tt>schedtest -ar1 -i e<br>
<br>
</tt>
<blockquote><tt>Scheduler settings of main thread</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
<br>
<tt>Scheduler settings in 'attr'</tt><br>
<tt>Â Â Â policy=SCHED_RR, priority=1</tt><br>
<tt>Â Â Â inheritsched is EXPLICIT</tt><br>
<br>
<tt>Scheduler attributes of new thread</tt><br>
<tt>Â Â Â policy=SCHED_RR, priority=1</tt><br>
</blockquote>
<br>
<tt>schedtest -mr1 -ao0 -i e<br>
<br>
</tt>
<blockquote><tt>Scheduler settings of main thread</tt><br>
<tt>Â Â Â policy=SCHED_RR, priority=1</tt><br>
<br>
<tt>Scheduler settings in 'attr'</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
<tt>Â Â Â inheritsched is EXPLICIT</tt><br>
<br>
<tt>Scheduler attributes of new thread</tt><br>
<tt>Â Â Â policy=SCHED_OTHER, priority=0</tt><br>
</blockquote>
</blockquote>
Does this give sufficient flexibility without having to
patch the kernel's scheduler (whatever its name)?<br>
<br>
Cheers, <br>
Steve<br>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<div>AFAIA it is the only main-stream pthreads
scheduler that doesn't. AFAIA BFS (what a name?!)
does support multiple thread priorities for user
processes.</div>
<div><br>
</div>
<div>Within the Squeak Cog VM (and in a number of
other VMs, SMalltalk and Java VMs amongst them)
there's a heartbeat which is used to cause the VM to
periodically break out of normal processing and poll
for events. A heartbeat is both much more
efficient, and more regular than e.g. decrementing a
counter as part of normal processing (e.g. frame
build on entering non-leaf methods). Ideally the
heartbeat is implemented as a thread spinning,
blocking in e.g. nanosleep and then forcing the
breakout before entering nanosleep again. But this
requires that the heartbeat thread runs at a higher
priority than the main VM thread(s). On linux
under the CFS this isn't possible. The fallback is
to use an interval timer (setitimer with
ITIMER_REAL) and a signal handler (for SIGALRM).Â
This is a poor substitute:</div>
<div>- system calls are interrupted, which can play
havoc with external code</div>
<div>- when debugging the heartbeat signal must be
disabled because otherwise one is constantly
stepping into the signal handler</div>
<div> - certain linux kernels have bugs with signal
delivery and threads which can cause the loss of a
thread's context, ending up with two threads having
the same context, hence the setitimer approach works
only with a strictly single-threaded VM (this is a
bug I found and worked around late last year in Red
Hat Enterprise Linux WS release 4 (Nahant Update 4)
vintage kernels, which alas I have customers using)</div>
<div><br>
</div>
<div>Either of these solutions would seem
straight-forward from the outside:</div>
<div>- make SCHED_RR and/or SCHED_FIFO for user
processes.</div>
<div>- implement multiple priorities for SCHED_OTHER</div>
<div> Expecting to be able to install a VM as a setuid
program is not realistic.</div>
<div><br>
</div>
<div>I think you'll find that this kind of
architectural issue is present in a number of
multi-media applications, not just dynamic language
virtual machines. The restriction to a single
thread priority is, frankly, pathetic. If you see
Rasbian and Pi as a platform for multi-media apps
then I would urge you to bring any influence you
have to bear on getting the linux kernel community
to provide multiple thread priorities. The lack
thereof is a significant limitation.</div>
<div>Â </div>
<div>best regards,</div>
<div>Eliot Miranda</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Of
course the next step after that<br>
wouldn't be dumping the upstream scheduler and
switching to BFS, but<br>
it would certainly justify taking a closer look.<br>
<br>
I'm not entirely sure why you want to fork BFS - as
far as I can see<br>
Con Kolivas is keeping the BFS and his larger -ck
patchset up to date<br>
with upstream releases.<br>
<br>
In conclusion (from a Raspberry Pi perspective):
please do play with<br>
BFS on the pi, do something useful with it (if it
solves the recently<br>
discussed issues with heartbeat+cogvm then swell),
then let's think<br>
about where to go from there.<br>
<br>
Regards,<br>
<br>
Alex<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
best,
<div>Eliot</div>
</blockquote>
<br>
<br>
<pre cols="72">--
You can follow me on twitter at <a moz-do-not-send="true" href="http://twitter.com/smalltalkhacker" target="_blank">http://twitter.com/smalltalkhacker</a></pre>
</div>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
best,
<div>Eliot</div>
</div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
You can follow me on twitter at <a class="moz-txt-link-freetext" href="http://twitter.com/smalltalkhacker">http://twitter.com/smalltalkhacker</a></pre>
</body>
</html>