[Vm-dev] Re: Cog on Linux

Derek O'Connell doconnel at gmail.com
Fri Jul 23 11:54:27 UTC 2010


Hi Eliot,

On 22/07/10 22:44, Eliot Miranda wrote:
>>>>
>>> just looks like the OS/run-time is not letting the program set a
>>> handler for SIGUSR2 and/or not allowing it to be caught.  This is
>>> a deal breaker.  Why it's happening I don't know, but currently
>>> Cog's heartbeat on linux depends on being able to catch SIGUSR2.
>>>
>>
>> From: http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html
>>
>> H.4: With LinuxThreads, I can no longer use the signals SIGUSR1 and
>> SIGUSR2 in my programs! Why?
>>
>> The short answer is: because the Linux kernel you're using does not
>> support realtime signals.
>>
>
> I'd forgotten all that!  I thought that stuff was ancient history.
> So we need two things, one is a pair of alternative signals, the
> other is a reliable #define that we can use to distinguish l'ancien
> regime from the modern day.

Something smells fishy about signals specifically reserved for user 
app's then being re-reserved for something else and since the page I 
linked to begins with the warning "This FAQ has not been updated for a 
while and may not be 100% up to date" I am trying to clarify the 
situation. Still at it but here is what I have dug up so far:

- "LinuxThreads" generally refers to threading on pre-2.6 kernels

- "NPTL", Native POSIX Threads Library for Linux, replaces
"LinuxThreads" on 2.5+ kernels (publicly 2.6+)

- "NGPT", IBM's Next Generation POSIX Threads, for 2.4 kernels and 
earlier, works/worked in conjunction with LinuxThreads

- RedHat back-ported NTPL to pre-2.6 kernels and made the threading 
model selectable between NTPL/LinuxThreads on a per process basis


To determine the threading library that a system uses (example shown for 
my system: Ubuntu 9.10, 2.6.31-22-generic #60-Ubuntu SMP Thu May 27 
00:22:23 UTC 2010 i686 GNU/Linux):

     > getconf GNU_LIBPTHREAD_VERSION
     NPTL 2.10.1


So that fishy smell might come from a RedHat specific red-herring ;-) 
Most likely on modern distro's with a 2.6+ kernel:

A) NTPL *is* being used
B) SIGUSR1/2 are *not* reserved
C) SIGUSR1/2, if they are indeed the source of any problems, may be 
getting used elsewhere (in a plugin perhaps)

More on this below but first I would throw into the mix what in my 
limited experience has sometimes been the source of odd problems. This 
is the handling of EINTR/EAGAIN errors and, AFAICT, the increased 
likelihood of these errors occurring depending on how busy a process 
and/or the system in general is. I have seen code that has been written 
pre-2.6 which has worked well even post-2.6 until system load increases 
and/or used in multi-threading. Some IOCtrl() calls then fail but since 
the immediate code does not handle EINTR/EAGAIN the result is some 
obscure error at a higher level.

Like Paul and Rob I got the VM compiled last weekend but it would either 
crash immediately or after maybe 15/20s. Then I crashed when I got to 
the stage of trying to debug a multi-threaded application :-)

I'm admittedly largely ignorant of how Cog changes the VM and apologise 
if my post re SIGUSR1/2 does prove to be a red-herring but I'm still 
wondering about the need for multi-threading for non-Teleplace use? From 
your latest email it seems as if multi-threading has been introduced to 
support high-priority for Teleplace "media processing". If this is 
functionality that will remain private to Teleplace and there is no 
clear benefit to others for a core VM high-priority thread, and given 
the difficulties debugging, then could public Cog be single threaded?

-D




More information about the Vm-dev mailing list