Hi Eliot,
On 07/22/2010 05:02 PM, vm-dev-request@lists.squeakfoundation.org wrote:
handle SIGUSR2 nostop noprint noignore
If I include the above line in my .gdbinit then gdb complains:
Cannot find user-level thread for LWP XXXXX
where XXXXX is the process number for the VM. Sometimes the VM window stays open and freezes at that point and sometimes it closes. Gdb then states that the "Target is running" when I type in the commands you listed. If I comment the "handle SIGUSR2 ..." line out then I get this from those commands:
(gdb) where #0 0xf7fdf430 in __kernel_vsyscall () #1 0xf7fabb16 in nanosleep () from /lib32/libpthread.so.0 #2 0x0805fa38 in tickerSleepCycle (ignored=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:375 #3 0xf7fa396e in start_thread () from /lib32/libpthread.so.0 #4 0xf7ed6b5e in clone () from /lib32/libc.so.6
(gdb) info registers eax 0xfffffdfc -516 ecx 0x0 0 edx 0xb7adb388 -1213353080 ebx 0xb7adb388 -1213353080 esp 0xb7adb358 0xb7adb358 ebp 0xb7adb398 0xb7adb398 esi 0xb7adbb70 -1213351056 edi 0x3d0f00 4001536 eip 0xf7fdf430 0xf7fdf430 <__kernel_vsyscall+16> eflags 0x296 [ PF AF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99
(gdb) x/5i $eip => 0xf7fdf430 <__kernel_vsyscall+16>: pop %ebp 0xf7fdf431 <__kernel_vsyscall+17>: pop %edx 0xf7fdf432 <__kernel_vsyscall+18>: pop %ecx 0xf7fdf433 <__kernel_vsyscall+19>: ret 0xf7fdf434: add %ch,(%esi)
(gdb) info threads * 2 Thread 0xb7adbb70 (LWP 27239) 0xf7fdf430 in __kernel_vsyscall () 1 Thread 0xf7e056c0 (LWP 27236) heartbeat_handler (sig=14, sig_info=0x63, context=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:461
(gdb) thread 1 [Switching to thread 1 (Thread 0xf7e056c0 (LWP 27236))]#0 heartbeat_handler (sig=14, sig_info=0x63, context=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:461 461 {
(gdb) bt #0 heartbeat_handler (sig=14, sig_info=0x63, context=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:461 #1 <signal handler called> #2 0xf7feefe0 in _dl_debug_state () from /lib/ld-linux.so.2 #3 0xf7ff272c in ?? () from /lib/ld-linux.so.2 #4 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2 #5 0xf7ff2106 in ?? () from /lib/ld-linux.so.2 #6 0xf7fb7c0b in ?? () from /lib32/libdl.so.2 #7 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2 #8 0xf7fb809c in ?? () from /lib32/libdl.so.2 #9 0xf7fb7b41 in dlopen () from /lib32/libdl.so.2 #10 0xf7c01b27 in ?? () from /usr/lib32/libX11.so.6 #11 0xf7c01fe7 in _XNoticeCreateBitmap () from /usr/lib32/libX11.so.6 #12 0xf7c0220d in XCreatePixmap () from /usr/lib32/libX11.so.6 #13 0xf7c010e2 in XCreateBitmapFromData () from /usr/lib32/libX11.so.6 #14 0xf7db70bb in display_ioSetCursorWithMask (cursorBitsIndex=-1210764836, cursorMaskIndex=<value optimized out>, offsetX=-1, offsetY=-1) at /home/paul/src/squeakvm/platforms/unix/vm-display-X11/sqUnixX11.c:3855 #15 0x08071422 in primitiveBeCursor () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:23540 #16 0x0807f443 in interpret () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:4872 #17 0x0807eeec in enterSmalltalkExecutiveImplementation () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:14771 #18 0x0807f118 in initStackPagesAndInterpret () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:18367 #19 0x0805eed3 in main (argc=2, argv=0xffffcda4, envp=0xffffcdb0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixMain.c:1627
(gdb) thread 2 [Switching to thread 2 (Thread 0xb7adbb70 (LWP 27239))]#0 0xf7fdf430 in __kernel_vsyscall ()
(gdb) bt #0 0xf7fdf430 in __kernel_vsyscall () #1 0xf7fabb16 in nanosleep () from /lib32/libpthread.so.0 #2 0x0805fa38 in tickerSleepCycle (ignored=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:375 #3 0xf7fa396e in start_thread () from /lib32/libpthread.so.0 #4 0xf7ed6b5e in clone () from /lib32/libc.so.6
Hi Paul,
On Thu, Jul 22, 2010 at 2:14 PM, Paul DeBruicker pdebruic@gmail.com wrote:
Hi Eliot,
On 07/22/2010 05:02 PM, vm-dev-request@lists.squeakfoundation.org wrote:
handle SIGUSR2 nostop noprint noignore
If I include the above line in my .gdbinit then gdb complains:
Cannot find user-level thread for LWP XXXXX
where XXXXX is the process number for the VM. Sometimes the VM window stays open and freezes at that point and sometimes it closes. Gdb then states that the "Target is running" when I type in the commands you listed. If I comment the "handle SIGUSR2 ..." line out then I get this from those commands:
just looks like the OS/run-time is not letting the program set a handler for SIGUSR2 and/or not allowing it to be caught. This is a deal breaker. Why it's happening I don't know, but currently Cog's heartbeat on linux depends on being able to catch SIGUSR2.
HTH Eliot
(gdb) where #0 0xf7fdf430 in __kernel_vsyscall () #1 0xf7fabb16 in nanosleep () from /lib32/libpthread.so.0 #2 0x0805fa38 in tickerSleepCycle (ignored=0x0)
at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:375 #3 0xf7fa396e in start_thread () from /lib32/libpthread.so.0 #4 0xf7ed6b5e in clone () from /lib32/libc.so.6
(gdb) info registers eax 0xfffffdfc -516 ecx 0x0 0 edx 0xb7adb388 -1213353080 ebx 0xb7adb388 -1213353080 esp 0xb7adb358 0xb7adb358
ebp 0xb7adb398 0xb7adb398 esi 0xb7adbb70 -1213351056 edi 0x3d0f00 4001536 eip 0xf7fdf430 0xf7fdf430 <__kernel_vsyscall+16>
eflags 0x296 [ PF AF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99
(gdb) x/5i $eip => 0xf7fdf430 <__kernel_vsyscall+16>: pop %ebp 0xf7fdf431 <__kernel_vsyscall+17>: pop %edx 0xf7fdf432 <__kernel_vsyscall+18>: pop %ecx 0xf7fdf433 <__kernel_vsyscall+19>: ret 0xf7fdf434: add %ch,(%esi)
(gdb) info threads
- 2 Thread 0xb7adbb70 (LWP 27239) 0xf7fdf430 in __kernel_vsyscall ()
1 Thread 0xf7e056c0 (LWP 27236) heartbeat_handler (sig=14,
sig_info=0x63, context=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:461
(gdb) thread 1 [Switching to thread 1 (Thread 0xf7e056c0 (LWP 27236))]#0 heartbeat_handler
(sig=14, sig_info=0x63, context=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:461 461 {
(gdb) bt #0 heartbeat_handler (sig=14, sig_info=0x63, context=0x0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:461 #1 <signal handler called> #2 0xf7feefe0 in _dl_debug_state () from /lib/ld-linux.so.2 #3 0xf7ff272c in ?? () from /lib/ld-linux.so.2 #4 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2 #5 0xf7ff2106 in ?? () from /lib/ld-linux.so.2 #6 0xf7fb7c0b in ?? () from /lib32/libdl.so.2 #7 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2 #8 0xf7fb809c in ?? () from /lib32/libdl.so.2 #9 0xf7fb7b41 in dlopen () from /lib32/libdl.so.2 #10 0xf7c01b27 in ?? () from /usr/lib32/libX11.so.6 #11 0xf7c01fe7 in _XNoticeCreateBitmap () from /usr/lib32/libX11.so.6 #12 0xf7c0220d in XCreatePixmap () from /usr/lib32/libX11.so.6 #13 0xf7c010e2 in XCreateBitmapFromData () from /usr/lib32/libX11.so.6 #14 0xf7db70bb in display_ioSetCursorWithMask (cursorBitsIndex=-1210764836, cursorMaskIndex=<value optimized out>, offsetX=-1, offsetY=-1) at /home/paul/src/squeakvm/platforms/unix/vm-display-X11/sqUnixX11.c:3855 #15 0x08071422 in primitiveBeCursor () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:23540 #16 0x0807f443 in interpret () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:4872 #17 0x0807eeec in enterSmalltalkExecutiveImplementation () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:14771 #18 0x0807f118 in initStackPagesAndInterpret () at /home/paul/src/squeakvm/src/vm/gcc3x-cointerp.c:18367 #19 0x0805eed3 in main (argc=2, argv=0xffffcda4, envp=0xffffcdb0) at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixMain.c:1627
(gdb) thread 2 [Switching to thread 2 (Thread 0xb7adbb70 (LWP 27239))]#0 0xf7fdf430 in __kernel_vsyscall ()
(gdb) bt #0 0xf7fdf430 in __kernel_vsyscall () #1 0xf7fabb16 in nanosleep () from /lib32/libpthread.so.0 #2 0x0805fa38 in tickerSleepCycle (ignored=0x0)
at /home/paul/src/squeakvm/platforms/unix/vm/sqUnixHeartbeat.c:375 #3 0xf7fa396e in start_thread () from /lib32/libpthread.so.0 #4 0xf7ed6b5e in clone () from /lib32/libc.so.6
On 22/07/10 22:20, Eliot Miranda wrote:
Hi Paul,
On Thu, Jul 22, 2010 at 2:14 PM, Paul DeBruickerpdebruic@gmail.com wrote:
Hi Eliot,
On 07/22/2010 05:02 PM, vm-dev-request@lists.squeakfoundation.org wrote:
handle SIGUSR2 nostop noprint noignore
If I include the above line in my .gdbinit then gdb complains:
Cannot find user-level thread for LWP XXXXX
where XXXXX is the process number for the VM. Sometimes the VM window stays open and freezes at that point and sometimes it closes. Gdb then states that the "Target is running" when I type in the commands you listed. If I comment the "handle SIGUSR2 ..." line out then I get this from those commands:
just looks like the OS/run-time is not letting the program set a handler for SIGUSR2 and/or not allowing it to be caught. This is a deal breaker. Why it's happening I don't know, but currently Cog's heartbeat on linux depends on being able to catch SIGUSR2.
From: http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html
H.4: With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?
The short answer is: because the Linux kernel you're using does not support realtime signals.
Hi Derek,
On Thu, Jul 22, 2010 at 2:25 PM, Derek O'Connell doconnel@gmail.com wrote:
On 22/07/10 22:20, Eliot Miranda wrote:
Hi Paul,
On Thu, Jul 22, 2010 at 2:14 PM, Paul DeBruickerpdebruic@gmail.com wrote:
Hi Eliot,
On 07/22/2010 05:02 PM, vm-dev-request@lists.squeakfoundation.org wrote:
handle SIGUSR2 nostop noprint noignore
If I include the above line in my .gdbinit then gdb complains:
Cannot find user-level thread for LWP XXXXX
where XXXXX is the process number for the VM. Sometimes the VM window stays open and freezes at that point and sometimes it closes. Gdb then states that the "Target is running" when I type in the commands you listed. If I comment the "handle SIGUSR2 ..." line out then I get this from those commands:
just looks like the OS/run-time is not letting the program set a handler for SIGUSR2 and/or not allowing it to be caught. This is a deal breaker. Why it's happening I don't know, but currently Cog's heartbeat on linux depends on being able to catch SIGUSR2.
From: http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html
H.4: With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?
The short answer is: because the Linux kernel you're using does not support realtime signals.
I'd forgotten all that! I thought that stuff was ancient history. So we need two things, one is a pair of alternative signals, the other is a reliable #define that we can use to distinguish l'ancien regime from the modern day.
thanks Derek!
best Eliot
Hi Eliot,
On 22/07/10 22:44, Eliot Miranda wrote:
just looks like the OS/run-time is not letting the program set a handler for SIGUSR2 and/or not allowing it to be caught. This is a deal breaker. Why it's happening I don't know, but currently Cog's heartbeat on linux depends on being able to catch SIGUSR2.
From: http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html
H.4: With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?
The short answer is: because the Linux kernel you're using does not support realtime signals.
I'd forgotten all that! I thought that stuff was ancient history. So we need two things, one is a pair of alternative signals, the other is a reliable #define that we can use to distinguish l'ancien regime from the modern day.
Something smells fishy about signals specifically reserved for user app's then being re-reserved for something else and since the page I linked to begins with the warning "This FAQ has not been updated for a while and may not be 100% up to date" I am trying to clarify the situation. Still at it but here is what I have dug up so far:
- "LinuxThreads" generally refers to threading on pre-2.6 kernels
- "NPTL", Native POSIX Threads Library for Linux, replaces "LinuxThreads" on 2.5+ kernels (publicly 2.6+)
- "NGPT", IBM's Next Generation POSIX Threads, for 2.4 kernels and earlier, works/worked in conjunction with LinuxThreads
- RedHat back-ported NTPL to pre-2.6 kernels and made the threading model selectable between NTPL/LinuxThreads on a per process basis
To determine the threading library that a system uses (example shown for my system: Ubuntu 9.10, 2.6.31-22-generic #60-Ubuntu SMP Thu May 27 00:22:23 UTC 2010 i686 GNU/Linux):
> getconf GNU_LIBPTHREAD_VERSION NPTL 2.10.1
So that fishy smell might come from a RedHat specific red-herring ;-) Most likely on modern distro's with a 2.6+ kernel:
A) NTPL *is* being used B) SIGUSR1/2 are *not* reserved C) SIGUSR1/2, if they are indeed the source of any problems, may be getting used elsewhere (in a plugin perhaps)
More on this below but first I would throw into the mix what in my limited experience has sometimes been the source of odd problems. This is the handling of EINTR/EAGAIN errors and, AFAICT, the increased likelihood of these errors occurring depending on how busy a process and/or the system in general is. I have seen code that has been written pre-2.6 which has worked well even post-2.6 until system load increases and/or used in multi-threading. Some IOCtrl() calls then fail but since the immediate code does not handle EINTR/EAGAIN the result is some obscure error at a higher level.
Like Paul and Rob I got the VM compiled last weekend but it would either crash immediately or after maybe 15/20s. Then I crashed when I got to the stage of trying to debug a multi-threaded application :-)
I'm admittedly largely ignorant of how Cog changes the VM and apologise if my post re SIGUSR1/2 does prove to be a red-herring but I'm still wondering about the need for multi-threading for non-Teleplace use? From your latest email it seems as if multi-threading has been introduced to support high-priority for Teleplace "media processing". If this is functionality that will remain private to Teleplace and there is no clear benefit to others for a core VM high-priority thread, and given the difficulties debugging, then could public Cog be single threaded?
-D
vm-dev@lists.squeakfoundation.org