[Vm-dev] Re: Cog on Linux
Derek O'Connell
doconnel at gmail.com
Fri Jul 23 11:54:27 UTC 2010
Hi Eliot,
On 22/07/10 22:44, Eliot Miranda wrote:
>>>>
>>> just looks like the OS/run-time is not letting the program set a
>>> handler for SIGUSR2 and/or not allowing it to be caught. This is
>>> a deal breaker. Why it's happening I don't know, but currently
>>> Cog's heartbeat on linux depends on being able to catch SIGUSR2.
>>>
>>
>> From: http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html
>>
>> H.4: With LinuxThreads, I can no longer use the signals SIGUSR1 and
>> SIGUSR2 in my programs! Why?
>>
>> The short answer is: because the Linux kernel you're using does not
>> support realtime signals.
>>
>
> I'd forgotten all that! I thought that stuff was ancient history.
> So we need two things, one is a pair of alternative signals, the
> other is a reliable #define that we can use to distinguish l'ancien
> regime from the modern day.
Something smells fishy about signals specifically reserved for user
app's then being re-reserved for something else and since the page I
linked to begins with the warning "This FAQ has not been updated for a
while and may not be 100% up to date" I am trying to clarify the
situation. Still at it but here is what I have dug up so far:
- "LinuxThreads" generally refers to threading on pre-2.6 kernels
- "NPTL", Native POSIX Threads Library for Linux, replaces
"LinuxThreads" on 2.5+ kernels (publicly 2.6+)
- "NGPT", IBM's Next Generation POSIX Threads, for 2.4 kernels and
earlier, works/worked in conjunction with LinuxThreads
- RedHat back-ported NTPL to pre-2.6 kernels and made the threading
model selectable between NTPL/LinuxThreads on a per process basis
To determine the threading library that a system uses (example shown for
my system: Ubuntu 9.10, 2.6.31-22-generic #60-Ubuntu SMP Thu May 27
00:22:23 UTC 2010 i686 GNU/Linux):
> getconf GNU_LIBPTHREAD_VERSION
NPTL 2.10.1
So that fishy smell might come from a RedHat specific red-herring ;-)
Most likely on modern distro's with a 2.6+ kernel:
A) NTPL *is* being used
B) SIGUSR1/2 are *not* reserved
C) SIGUSR1/2, if they are indeed the source of any problems, may be
getting used elsewhere (in a plugin perhaps)
More on this below but first I would throw into the mix what in my
limited experience has sometimes been the source of odd problems. This
is the handling of EINTR/EAGAIN errors and, AFAICT, the increased
likelihood of these errors occurring depending on how busy a process
and/or the system in general is. I have seen code that has been written
pre-2.6 which has worked well even post-2.6 until system load increases
and/or used in multi-threading. Some IOCtrl() calls then fail but since
the immediate code does not handle EINTR/EAGAIN the result is some
obscure error at a higher level.
Like Paul and Rob I got the VM compiled last weekend but it would either
crash immediately or after maybe 15/20s. Then I crashed when I got to
the stage of trying to debug a multi-threaded application :-)
I'm admittedly largely ignorant of how Cog changes the VM and apologise
if my post re SIGUSR1/2 does prove to be a red-herring but I'm still
wondering about the need for multi-threading for non-Teleplace use? From
your latest email it seems as if multi-threading has been introduced to
support high-priority for Teleplace "media processing". If this is
functionality that will remain private to Teleplace and there is no
clear benefit to others for a core VM high-priority thread, and given
the difficulties debugging, then could public Cog be single threaded?
-D
More information about the Vm-dev
mailing list