<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Elliott,<br>
    Happy to help. Glad to hear it works on Redhat too. I only had
    access to Ubuntu at the time I tried it. The only reason I suggested
    the group-based version was to avoid any potential conflicts with
    other config on the machine. As you say, the global version works
    too, it just applies to all users, which may or may not be what one
    wants.<br>
    <br>
    Cheers, Steve<br>
    <br>
    <div class="moz-cite-prefix">On 25/05/2013 00:13, Eliot Miranda
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAC20JE2GrwrxFftt3-JvqVtwB9=XZXqXUykWi30EKn5ubs_NGg@mail.gmail.com"
      type="cite">
      <pre wrap=""> </pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      Steve,
      <div><br>
      </div>
      <div>  Â  thank you _so much_! Â This works like a charm. Â At
        least on redhat I didn't have to add a group and thins work fine
        with the first of the <span class="Apple-style-span"
          style="font-family:monospace">/etc/security/limits.d/squeakvm.conf</span><font
          class="Apple-style-span" face="arial, helvetica, sans-serif">
          approaches.</font><br>
        <br>
        <div class="gmail_quote">On Sun, Apr 21, 2013 at 5:10 AM, Steve
          Rees <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:squeak-vm-dev@vimes.worldonline.co.uk"
              target="_blank">squeak-vm-dev@vimes.worldonline.co.uk</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
            <div text="#000000" bgcolor="#FFFFFF"> Hi Eliot,<br>
              <br>
              <div>On 19/04/2013 17:23, Eliot Miranda wrote:<br>
              </div>
              <blockquote type="cite">Hi Alex,<br>
                <br>
                <div class="gmail_quote">On Fri, Apr 19, 2013 at 6:16
                  AM, Alex Bradbury <span dir="ltr">&lt;<a
                      moz-do-not-send="true"
                      href="mailto:asb@asbradbury.org" target="_blank">asb@asbradbury.org</a>&gt;</span>
                  wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
                    On 19 April 2013 08:38, Casey Ransberger &lt;<a
                      moz-do-not-send="true"
                      href="mailto:casey.obrien.r@gmail.com"
                      target="_blank">casey.obrien.r@gmail.com</a>&gt;
                    wrote:<br>
                    &gt;<br>
                    &gt; I had a brief chat with Con Kolivas, who did
                    BFS (which implements kernel stuff that will make
                    Cog happier under Linux on machines with
                    sub-supercomputing quantities of CPUs) tonight.<br>
                    &gt;<br>
                    &gt; It sounds like there are actually two reasons
                    it hasn't made it into the mainline kernel:<br>
                    &gt;<br>
                    &gt; a) he doesn't have time to support it, and<br>
                    &gt; b) the other kernel folks don't want it.<br>
                    &gt;<br>
                    &gt; Oh well. Since right now I'm focused on
                    Raspbian, I sent a message explaining what it was,
                    why I want it, etc on their web board. If I do get
                    it in, support would have to fall to me. Yikes,
                    right? ;)<br>
                    <br>
                    Yes, for political reasons it seems unlikely
                    anything like BFS would<br>
                    get in to the upstream kernel. If someone can do
                    work to actually show<br>
                    noticeable performance gains then that would make us
                    (the Raspberry Pi<br>
                    Foundation) interested in exploring further. Real
                    workloads that<br>
                    perform much better with an alternative scheduler
                    would be much more<br>
                    interesting than microbenchmarks. </blockquote>
                  <div><br>
                  </div>
                  <div>This isn't about workload or performance.  It is
                    about basic functionality.  The CFS scheduler does
                    not support multiple thread priorities for user
                    processes (actually, for the non-real-time
                    scheduling policy, and the real-time scheduling
                    policy is available only to superuser processes). 
                    <br>
                  </div>
                </div>
              </blockquote>
              This isn't entirely true. Out-of-the-box unprivileged
              processes can't change the scheduling policy, but in
              kernels after 2.6.12 it is possible to configure your
              system to allow this without resorting to setuid root.<br>
              <br>
              Quoting from the man page for sched_setscheduler - <a
                moz-do-not-send="true"
                href="http://linux.die.net/man/2/sched_setscheduler"
                target="_blank">http://linux.die.net/man/2/sched_setscheduler</a>
              - (the privilege restrictions are the same as for
              pthread_attr_setschedpolicy), "If an unprivileged process
              has a nonzero RLIMIT_RTPRIO soft limit, then it can change
              its scheduling policy and priority, subject to the
              restriction that the priority cannot be set to a value
              higher than the maximum of its current priority and its
              RLIMIT_RTPRIO soft limit."<br>
              <br>
              Using the pam_limits.so module, one can set the
              RLIMIT_RTPRIO soft limit higher than zero, which then
              allows the use of the SCHED_FIFO and SCHED_RR policies
              with priorities up to the soft limit.<br>
              <br>
              One way to achieve this is to add the following lines to
              the file /etc/security/limits.conf.<br>
              <blockquote><tt>*    hard    rtprio    1</tt><br>
                <tt>*    soft    rtprio    1</tt><br>
              </blockquote>
              or you can add a squeakvm.conf file to
              /etc/security/limits.d with those same lines, eg.<br>
              <blockquote><tt># /etc/security/limits.d/squeakvm.conf</tt><br>
                <tt>*    hard    rtprio    1</tt><br>
                <tt>*    soft    rtprio    1</tt><br>
              </blockquote>
              This grants this capability to unprivileged users, but you
              will need to logout and login again for it to take effect,
              as pam limits are applied at user login.<br>
              <br>
              The only problem with this approach is that there's a
              possibility it might conflict with other global settings
              for the rtprio. Another alternative is to grant the
              privilege to a group (eg. squeakvm) and then add users to
              that group to allow the ability to change the SCHED_FIFO
              or SCHED_RR policies and to change the priorities of
              threads:<br>
              <blockquote><tt># /etc/security/limits.d/squeakvm.conf</tt><br>
                <tt>@squeakvm    hard    rtprio    1</tt><br>
                <tt> <tt>@squeakvm</tt>    soft    rtprio    1</tt></blockquote>
              This will grant the ability only to users in the squeakvm
              group. The 1 in the examples above is the maximum
              priority. Higher levels could be used, but a level of 1 is
              necessary to trigger the capability.<br>
              <br>
              Of course the group needs to exist for this to take
              effect.<br>
              <blockquote><tt>sudo groupadd squeakvm</tt><br>
              </blockquote>
              There's a handy test program on the pthread_setschedparam
              man page - <a moz-do-not-send="true"
                href="http://linux.die.net/man/3/pthread_setschedparam"
                target="_blank">http://linux.die.net/man/3/pthread_setschedparam</a>
              - that can be used for experimentation. I've attached the
              source. I tried this out on an up-to-date Ubuntu Server
              12.04 LTS VM running on a MacbookPro under VMWare Fusion.
              YMMV.<br>
              <br>
              pthreads_sched_test is a bit of a verbose name, so I named
              the test program "schedtest" when I compiled it. Here are
              the results of my tests.<br>
              <br>
              First, compile the program<br>
              <blockquote><tt>gcc pthreads_sched_test.c -o schedtest
                  -lpthread</tt><br>
              </blockquote>
              The first set of tests were performed without making any
              changes to the PAM limits.<br>
              <br>
              Running schedtest without arguments gives the following<br>
              <blockquote><tt>./schedtest<br>
                  <br>
                </tt>
                <blockquote><tt>Scheduler settings of main thread</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                  <br>
                  <tt>Scheduler settings in 'attr'</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                  <tt>    inheritsched is INHERIT</tt><br>
                  <br>
                  <tt>Scheduler attributes of new thread</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                </blockquote>
              </blockquote>
              Trying to change the policy and priority of the new thread
              the program creates gives the following<br>
              <blockquote><tt>./schedtest -ar1 -i e<br>
                  <br>
                </tt>
                <blockquote><tt>Scheduler settings of main thread</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                  <br>
                  <tt>Scheduler settings in 'attr'</tt><br>
                  <tt>    policy=SCHED_RR, priority=1</tt><br>
                  <tt>    inheritsched is EXPLICIT</tt><br>
                  <br>
                  <tt>pthread_create: Operation not permitted</tt><br>
                  <tt> </tt></blockquote>
              </blockquote>
              Trying to change the priority of the main thread gives<br>
              <blockquote><tt>./schedtest -mr1<br>
                  <br>
                </tt>
                <blockquote><tt>pthread_setschedparam: Operation not
                    permitted</tt><br>
                </blockquote>
              </blockquote>
              As Eliot described, the default configuration prevents
              unprivileged user processes from changing the priority or
              scheduling policy.<br>
              <br>
              After adding the /etc/security/limits.d/squeakvm.conf file
              describe above, adding my user to the squeakvm group and
              logging out and back in again, the tests are somewhat more
              successful. Note that these are the only additional
              privileges given to the squeakvm group.<br>
              <blockquote><tt>schedtest -ar1 -i e<br>
                  <br>
                </tt>
                <blockquote><tt>Scheduler settings of main thread</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                  <br>
                  <tt>Scheduler settings in 'attr'</tt><br>
                  <tt>    policy=SCHED_RR, priority=1</tt><br>
                  <tt>    inheritsched is EXPLICIT</tt><br>
                  <br>
                  <tt>Scheduler attributes of new thread</tt><br>
                  <tt>    policy=SCHED_RR, priority=1</tt><br>
                </blockquote>
                <br>
                <tt>schedtest -mr1 -ao0 -i e<br>
                  <br>
                </tt>
                <blockquote><tt>Scheduler settings of main thread</tt><br>
                  <tt>    policy=SCHED_RR, priority=1</tt><br>
                  <br>
                  <tt>Scheduler settings in 'attr'</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                  <tt>    inheritsched is EXPLICIT</tt><br>
                  <br>
                  <tt>Scheduler attributes of new thread</tt><br>
                  <tt>    policy=SCHED_OTHER, priority=0</tt><br>
                </blockquote>
              </blockquote>
              Does this give sufficient flexibility without having to
              patch the kernel's scheduler (whatever its name)?<br>
              <br>
              Cheers, <br>
              Steve<br>
              <br>
              <blockquote type="cite">
                <div class="gmail_quote">
                  <div>AFAIA it is the only main-stream pthreads
                    scheduler that doesn't.  AFAIA BFS (what a name?!)
                    does support multiple thread priorities for user
                    processes.</div>
                  <div><br>
                  </div>
                  <div>Within the Squeak Cog VM (and in a number of
                    other VMs, SMalltalk and Java VMs amongst them)
                    there's a heartbeat which is used to cause the VM to
                    periodically break out of normal processing and poll
                    for events.  A heartbeat is both much more
                    efficient, and more regular than e.g. decrementing a
                    counter as part of normal processing (e.g. frame
                    build on entering non-leaf methods).  Ideally the
                    heartbeat is implemented as a thread spinning,
                    blocking in e.g. nanosleep and then forcing the
                    breakout before entering nanosleep again.  But this
                    requires that the heartbeat thread runs at a higher
                    priority than the main VM thread(s).  On linux
                    under the CFS this isn't possible.  The fallback is
                    to use an interval timer (setitimer with
                    ITIMER_REAL) and a signal handler (for SIGALRM). 
                    This is a poor substitute:</div>
                  <div>- system calls are interrupted, which can play
                    havoc with external code</div>
                  <div>- when debugging the heartbeat signal must be
                    disabled because otherwise one is constantly
                    stepping into the signal handler</div>
                  <div> - certain linux kernels have bugs with signal
                    delivery and threads which can cause the loss of a
                    thread's context, ending up with two threads having
                    the same context, hence the setitimer approach works
                    only with a strictly single-threaded VM (this is a
                    bug I found and worked around late last year in Red
                    Hat Enterprise Linux WS release 4 (Nahant Update 4)
                    vintage kernels, which alas I have customers using)</div>
                  <div><br>
                  </div>
                  <div>Either of these solutions would seem
                    straight-forward from the outside:</div>
                  <div>- make SCHED_RR and/or SCHED_FIFO for user
                    processes.</div>
                  <div>- implement multiple priorities for SCHED_OTHER</div>
                  <div> Expecting to be able to install a VM as a setuid
                    program is not realistic.</div>
                  <div><br>
                  </div>
                  <div>I think you'll find that this kind of
                    architectural issue is present in a number of
                    multi-media applications, not just dynamic language
                    virtual machines.  The restriction to a single
                    thread priority is, frankly, pathetic.  If you see
                    Rasbian and Pi as a platform for multi-media apps
                    then I would urge you to bring any influence you
                    have to bear on getting the  linux kernel community
                    to provide multiple thread priorities.  The lack
                    thereof is a significant limitation.</div>
                  <div> </div>
                  <div>best regards,</div>
                  <div>Eliot Miranda</div>
                  <div><br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">Of
                    course the next step after that<br>
                    wouldn't be dumping the upstream scheduler and
                    switching to BFS, but<br>
                    it would certainly justify taking a closer look.<br>
                    <br>
                    I'm not entirely sure why you want to fork BFS - as
                    far as I can see<br>
                    Con Kolivas is keeping the BFS and his larger -ck
                    patchset up to date<br>
                    with upstream releases.<br>
                    <br>
                    In conclusion (from a Raspberry Pi perspective):
                    please do play with<br>
                    BFS on the pi, do something useful with it (if it
                    solves the recently<br>
                    discussed issues with heartbeat+cogvm then swell),
                    then let's think<br>
                    about where to go from there.<br>
                    <br>
                    Regards,<br>
                    <br>
                    Alex<br>
                  </blockquote>
                </div>
                <br>
                <br clear="all">
                <div><br>
                </div>
                -- <br>
                best,
                <div>Eliot</div>
              </blockquote>
              <br>
              <br>
              <pre cols="72">-- 
You can follow me on twitter at <a moz-do-not-send="true" href="http://twitter.com/smalltalkhacker" target="_blank">http://twitter.com/smalltalkhacker</a></pre>
            </div>
            <br>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        best,
        <div>Eliot</div>
      </div>
    </blockquote>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
You can follow me on twitter at <a class="moz-txt-link-freetext" href="http://twitter.com/smalltalkhacker">http://twitter.com/smalltalkhacker</a></pre>
  </body>
</html>