[Vm-dev] Heartbeat & Linux

Casey Ransberger casey.obrien.r at gmail.com
Wed Apr 24 20:07:43 UTC 2013


Bert, I didn't say I thought the Stack VM would be too slow without messing
with pthreads, or I didn't mean to. The idea is to see how far I can push
the little gadget:)


On Mon, Apr 22, 2013 at 10:52 AM, Eliot Miranda <eliot.miranda at gmail.com>wrote:

>
>
>
> On Mon, Apr 22, 2013 at 10:31 AM, Bert Freudenberg <bert at freudenbergs.de>wrote:
>
>>
>>
>> On 2013-04-22, at 19:16, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>
>>
>>
>> On Mon, Apr 22, 2013 at 5:33 AM, Bert Freudenberg <bert at freudenbergs.de>wrote:
>>
>>>
>>> Still, you make it sound like this inefficiency is actually preventing
>>> the Stack VM from running on the Raspberry Pi? That is hard to believe.
>>>
>>
>> No, we just use the interval timer.  But as I say in a
>> server/multi-user/ffi context the interval timer is problematic.
>>
>>
>> Okay. So for the typical local single pi user it should be fine :)
>>
>> Btw, what kind of deadlocks have you seen with SA_RESTART? Normally that
>> should avoid the EINTR problems.
>>
>
> Just weird bugs that look like linux internal issues.
>
> The most recent one is with PAM, the authentication library, using
> pam_start and pam_authenticate with a callback to fetch the password (the
> pam_conv structure).  On linux if one successfully authenticates then fine.
>  But if one answers an invalid password to fail the authentication the
> ITIMER_REAL interval timer's reload value gets set to zero, disabling the
> heartbeat.  If one puts a breakpoint on the setitimer entry-point for the
> setitimer system call then it is never called (I didn't put a breakpoint on
> the indirect system call).
>
> The previous one with RHES 4 is that if there are two threads in the VM
> then very occasionally in the delivery of the SIGALRM signal the kernel
> confuses the thread control blocks of the two threads and one ends up with
> both the VM and the second thread running the second thread's code,
> freezing the VM.
>
> <rant>Essentially I think linux has serious quality control issues and the
> itimer tickles them.  These things are a right royal pain in the ass to
> debug and a pain to work-around.  I don't see much difference between
> Windows and linux in terms of quality.  They're both abysmal.  In my
> experiemce Mac OS X, Solaris and HP-UX are of signaificantly higher
> quality</rant>.
>
>>
>> - Bert -
>>
>>
>>> - Bert -
>>>
>>> On 2013-04-19, at 05:24, Casey Ransberger <casey.obrien.r at gmail.com>
>>> wrote:
>>>
>>> I had a feeling the actual problem was probably political. The name is
>>> also a bit... well. As many children will prefer to avoid the use of
>>> profanity while raising their parents, I wonder if I shouldn't start by
>>> forking it and calling it the Big Furry Scheduler or something...
>>>
>>> Anyhow this sheds lots of light on things. Thank, Eliot!
>>>
>>>
>>> On Thu, Apr 18, 2013 at 1:29 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:
>>>
>>>>
>>>> Hi Casey,
>>>>
>>>> On Thu, Apr 18, 2013 at 5:37 AM, Casey Ransberger <
>>>> casey.obrien.r at gmail.com> wrote:
>>>>
>>>>>
>>>>> Totally. I know it isn't going to be a small thing. I'm curious if any
>>>>> of the BSD implementations solve the problem adequately for Cog; I might
>>>>> have something working to study, build tests around, etc. A control group.
>>>>>
>>>>
>>>> The technical problem was solved for Linux in 2009 by Con Kolivas with
>>>> his (excuse the language)
>>>> http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler.  The political
>>>> problem of getting it adopted throughout the linux community so it can be
>>>> relied upon is an entirely different thing.
>>>>
>>>> Note that once BFS is more widely available I could write a VM that
>>>> will test thread priorities on startup and use threads if multiple
>>>> priorities are supported, falling back on the ITIMER if not.
>>>>
>>>>
>>>>>
>>>>> Either way though, it would be a largish win for the community, given
>>>>> the widespread popularity of Linux. I might be able to find some friends
>>>>> who might like to attack it with me. Etcetera.
>>>>>
>>>>> I wonder about why the Linux folks haven't dealt with it already.
>>>>> Maybe it hasn't come up. Hopefully it isn't a security thing, that'd be
>>>>> hard to make a case about.
>>>>>
>>>>> Either way, I haven't figured out where to start looking yet.
>>>>>
>>>>>
>>>>> On Thu, Apr 18, 2013 at 5:23 AM, David T. Lewis <lewis at mail.msen.com>wrote:
>>>>>
>>>>>>
>>>>>> I think that Eliot has described the issue in some detail either on
>>>>>> this list, or on his blog http://www.mirandabanda.org/cogblog/. Sorry
>>>>>> I don't have any links handy, but reading through the blog is time
>>>>>> well spent in any case :)
>>>>>>
>>>>>> I think you are dealing with a fundamental issue in the pthreads
>>>>>> implementation(s) on Linux, so don't be surprised if it's not just
>>>>>> a simple matter of fixing a bug. The pthreads implementation is more
>>>>>> or less grafted onto the original Unix process model, and operating
>>>>>> systems tend to vary as to whether they support pthreads fully, or
>>>>>> for that matter whether they support a thread model at all.
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>> On Thu, Apr 18, 2013 at 05:07:42AM -0700, Casey Ransberger wrote:
>>>>>> >
>>>>>> > Well I guess maybe I left a part of the message out. Oops. I'm
>>>>>> talking
>>>>>> > about Cog's heart. It needs user-level thread prioritization and
>>>>>> apparently
>>>>>> > (GNU/?)Linux lacks this.
>>>>>> >
>>>>>> > My guess is it's a kernel issue. But that's an uneducated guess.
>>>>>> >
>>>>>> > It became interesting to me because I'm not super happy with the UI
>>>>>> > responsiveness of Squeak trunk under the interpreter and Raspbian
>>>>>> on a
>>>>>> > 700MHz Pi (out of box experience isn't overclocked, so I'm
>>>>>> targeting that.)
>>>>>> > My thought was to get the stack VM going, but we still have this
>>>>>> issue with
>>>>>> > the heartbeat for the stack VM to be able to perform optimally. If
>>>>>> I can
>>>>>> > fix the heartbeat (read: pthreads) issue, I can benefit GNU/Linux
>>>>>> users of
>>>>>> > the system across the board, both in terms of the (Intel) JIT, and
>>>>>> in terms
>>>>>> > of the stack oriented virtual machine which has the awesome
>>>>>> Ungar-style
>>>>>> > garbage collection, right? I want the efficient GC eventually
>>>>>> anyway. The
>>>>>> > advantages of stack orientation are somewhat beyond my current
>>>>>> > understanding of virtual machines, but I gather that's desirable as
>>>>>> well?
>>>>>> > </noob>
>>>>>> >
>>>>>> > I need to do some tests with Cuis, Spoon, Pharo, and Etoys before
>>>>>> I'm going
>>>>>> > to blame any part of the VM about the UI perf I'm seeing, but a
>>>>>> faster VM
>>>>>> > is a faster VM anyway. I'd like to make all of the images faster on
>>>>>> this
>>>>>> > little gadget, because it's cool, it's popular, and it gives us an
>>>>>> approach
>>>>>> > on more than just the third world (not that the third world isn't
>>>>>> > incredibly important, just that if we don't start making better
>>>>>> adults
>>>>>> > where people with first world problems are too, I'll end up feeling
>>>>>> like
>>>>>> > Rick Moranis in Spaceballs or I'm going to have to relocate to
>>>>>> someplace
>>>>>> > where we previously made better adults.)
>>>>>> >
>>>>>> > http://www.youtube.com/watch?v=sen8Tn8CBA4
>>>>>> >
>>>>>> > I know this isn't really the place to ask, but I thought maybe
>>>>>> someone
>>>>>> > might be able to point me at what I'd need to dig into to
>>>>>> understand the
>>>>>> > problem we have with the popular Linux implementation of pthreads,
>>>>>> because
>>>>>> > when I GOOG that, I get #1 a tutorial on using pthreads, #2 a
>>>>>> Wikipedia
>>>>>> > article which speaks abstractly about pthreads, and #3 another
>>>>>> tutorial on
>>>>>> > using pthreads. I think I'm trying to figure out what to google for
>>>>>> so that
>>>>>> > I can figure out where the problem lives and then look at it to see
>>>>>> whether
>>>>>> > or not I can steal its lunch money.
>>>>>> >
>>>>>> > Even if I google "linux pthreads" I still get a tutorial on using
>>>>>> them as
>>>>>> > the top hit.
>>>>>> >
>>>>>> > This is the most directly informative thing I can find:
>>>>>> >
>>>>>> > http://man7.org/linux/man-pages/man7/pthreads.7.html
>>>>>> >
>>>>>> > Hopefully this explains anything I might have left out of my first
>>>>>> message
>>>>>> > on the subject!
>>>>>> >
>>>>>> > Casey
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Apr 17, 2013 at 5:33 AM, David T. Lewis <
>>>>>> lewis at mail.msen.com> wrote:
>>>>>> >
>>>>>> > >
>>>>>> > > On Wed, Apr 17, 2013 at 01:15:45AM -0700, Casey Ransberger wrote:
>>>>>> > > >
>>>>>> > > > I'm assuming the problem is in the kernel, but making
>>>>>> assumptions is
>>>>>> > > usually a bad plan, so I'm asking.
>>>>>> > > >
>>>>>> > > > When I look, I find a rather confusing jumble: pthreads exists
>>>>>> in
>>>>>> > > different implementations under different operating systems.
>>>>>> > > >
>>>>>> > > > I'd kind of like the heartbeat to work right under Raspbian,
>>>>>> because
>>>>>> > > kids are going to use it, and the goal of producing better adults
>>>>>> is pretty
>>>>>> > > close to my heart. See also making Squeak faster. (Stack VM.)
>>>>>> > > >
>>>>>> > > > I want to arm myself with as much knowledge about this as
>>>>>> possible,
>>>>>> > > because I'm considering just bloody fixing it. If the (GNU/Linux)
>>>>>> community
>>>>>> > > has objections, and I'm able to pull off the fix, I can make a
>>>>>> patch
>>>>>> > > available to people who want it.
>>>>>> > > >
>>>>>> > > > I'm not confident that I can fix the bug with the knowledge I
>>>>>> have now,
>>>>>> > > but I'm confident that I can fix it eventually, if someone else
>>>>>> doesn't
>>>>>> > > beat me to the punch.
>>>>>> > > >
>>>>>> > > > Outside of the wiki page, are there any
>>>>>> documents/people/mailing lists I
>>>>>> > > should look at to be able to get a good handle on where in the
>>>>>> kernel (or
>>>>>> > > elsewhere) I should be looking for the code which isn't meeting
>>>>>> our
>>>>>> > > Cog-nitive requirements?
>>>>>> > > >
>>>>>> > >
>>>>>> > > I do not understand what problem or bug you are asking about. Is
>>>>>> there
>>>>>> > > some issue with Raspbian Linux that is causing a problem?
>>>>>> > >
>>>>>> > > Dave
>>>>>> > >
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Casey Ransberger
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> best,
>>>> Eliot
>>>>
>>>>
>>>
>>>
>>> --
>>> Casey Ransberger
>>>
>>>
>>>
>>>
>>
>>
>> --
>> best,
>> Eliot
>>
>>
>> - Bert -
>>
>>
>>
>>
>
>
> --
> best,
> Eliot
>
>


-- 
Casey Ransberger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20130424/71aab470/attachment.htm


More information about the Vm-dev mailing list