[Vm-dev] VM on Solaris (was: Camera sig fault on 64 bits machines.)

Eliot Miranda eliot.miranda at gmail.com
Thu Apr 24 18:28:30 UTC 2014


Hi Andreas,


On Thu, Apr 24, 2014 at 9:58 AM, Andreas Wacknitz <a.wacknitz at gmx.de> wrote:

>
>
> Am 24.04.2014 um 00:14 schrieb Eliot Miranda <eliot.miranda at gmail.com>:
>
> Hi Andreas,
>
>
> On Wed, Apr 23, 2014 at 9:35 AM, Andreas Wacknitz <a.wacknitz at gmx.de>wrote:
>
>>
>> Thanks again Eliot,
>>
>> First, I solved the pthreads problem under OpenSolaris. While Solaris 10
>> doesn’t need special user privileges for thread control (at least within
>> the same thread policy I guess),
>> users under Solaris 11 (and thus OpenSolaris) need the privilege
>> „proc_priocntl“ to be given by an administrator.
>> (For those who are interested: usermod -K defaultpriv=basic,proc_priocntl
>> andreas)
>>
>
> This is a pain :-).  You could either assume that people can always get
> the necessary permission and go with the threaded heartbeat (my preferred
> suggestion) or provide two VMs (always tedious).
>
> Yes, I consider going with the threaded heartbeat for OpenSolaris (I will
> also try to compile everything under Solaris 11.1 but that’s on lower
> priority for me as I am not really using it.).
> I am not yet decided whether the version without increased priority would
> be enough. At the moment everything seems to run fine with this version; I
> can interrupt "[[true] whileTrue] forkAt: Processor userInterruptPriority“
> by ALT-.
>

That implies it is working.  But I would definitely make sure the heartbeat
runs at a higher priority than the main thread.

One thing to check is that delays expire even when the system is fully
busy, e.g.

| run s |
run := true.
s := Semaphore new.
[| i | i := 0. s wait. [run] whileTrue: [i := i + 1]] forkAt: Processor
highestPriority - 1.
[(Delay forSeconds: 1) wait. run := false] forkAt: Processor
highestPriority.
s signal

should lock up the system for 1 second.  If the heartbeat is not advancing
the clock used to check for delays then the sytsem will remain locked.

But the whole thing isn’t finished yet as FFI (NativeBoost doesn’t seem to
> work (e.g. "UnixEnvironment environ“ fails. I don’t know when I will find
> time to deal with that.
>

Well, the code is still useful for the Squeak VM, so please commit if and
when you have the heartbeat working to your satisfaction.


> More below…
>>
>> Am 22.04.2014 um 22:31 schrieb Eliot Miranda <eliot.miranda at gmail.com>:
>>
>>
>>
>>
>> On Tue, Apr 22, 2014 at 1:10 PM, Andreas Wacknitz <a.wacknitz at gmx.de>wrote:
>>
>>>
>>>
>>> Am 22.04.2014 um 21:36 schrieb Eliot Miranda <eliot.miranda at gmail.com>:
>>>
>>> Hi Andreas,
>>>
>>>
>>> On Tue, Apr 22, 2014 at 12:05 PM, Andreas Wacknitz <a.wacknitz at gmx.de>wrote:
>>>
>>>>
>>>> This evening I further dealt with the problems on OpenSolaris
>>>> (openindiana).
>>>> I finally got a pthread version running without superuser rights. But I
>>>> don’t know whether this will really work (ATM it does for me)
>>>> because I removed the call to pthread_setschedparam in beatStateMachine
>>>> leaving the heartbeat thread with the same
>>>> priority than the vm thread.
>>>
>>>
>>> Alas, that will not work -(.  As soon as the image enters into a hard
>>> loop (e.g. [true] whileTrue) the heartbeat thread will be blocked and the
>>> VM will never break out of the loop.
>>>
>>>
>>> How can I check this blockage? I started the VM with --pollpipe 1 and
>>> then [true] whileTrue in a Workspace. The GUI is blocked but the pipe is
>>> still rotating.
>>>
>>
>> Can you interrupt with ctrl-period?  If not, then I don't understand how
>> the pip is still rotating :-).  If you can, then you're not blocking the
>> system.  Try e.g. [[true] whileTrue]  forkAt: Processor highestPriority.
>>
>> Yes, I can do that in both (with and without higher priority) BUT not
>> when running this with highestPriority (again in BOTH versions!).
>>
>
> Oops That's right.  It should be just "[[true] whileTrue]  forkAt:
> Processor userPriority + 1".  Obviously one can't interrupt something
> running higher than userInterrupt priority.  Sorry, I was asleep.
>
> See above. OpenSolaris seems to run fine with the two threads having the
> same priority.
>

Since I've been here before I know that examples can be constructed when
this will not work properly.  My Delay example above should be one of them.
 All one needs is to arrange that the system is fully busy, shuts out the
heartbeat thread, but depends on the heartbeat thread to make progress (as
in the Delay example; the heartbeat advances a low-resolution (~2ms) clock
that is used to fire delays).


>  You are running the JIT right?
>>
>> How to tell for sure?
>>
>
> vm -version
>
> If it includes a CoInterpreter line you're running the JIT.  e.g.
> McStalker.macbuild$ oscfvm -version
> /Users/eliot/Cog/oscogvm/macbuild/Fast.app/Contents/MacOS/Squeak
> 4.0 4.0.2894 Mac OS X built on Apr 14 2014 17:02:16 Compiler: 4.2.1 (Apple
> Inc. build 5666) (dot 3) [Production VM]
> CoInterpreter VMMaker.oscog-eem.674 uuid:
> eefd603d-9638-4ad8-99c0-4ee12e87d49d Apr 14 2014
> StackToRegisterMappingCogit VMMaker.oscog-eem.674 uuid:
> eefd603d-9638-4ad8-99c0-4ee12e87d49d Apr 14 2014
> VM: r2894 http://www.squeakvm.org/svn/squeak/branches/Cog Date:
> 2014-04-14 15:32:11 -0700
> Plugins: r2545
> http://squeakvm.org/svn/squeak/trunk/platforms/Cross/plugins
>
> merkur pharo-without-higher-priority $ ./pharo --version
> 3.9-7 #1 22. April 2014 20:30:39 CEST gcc 4.7.3 [Production VM]
> NBCoInterpreter NativeBoost-CogPlugin-GuillermoPolito.19 uuid:
> acc98e51-2fba-4841-a965-2975997bba66 Apr 22 2014
> NBCogit NativeBoost-CogPlugin-GuillermoPolito.19 uuid:
> acc98e51-2fba-4841-a965-2975997bba66 Apr 22 2014
> https://github.com/pharo-project/pharo-vm.git Commit:
> 9e648898f53aadb692f2dc95f432daedc449d432 Date: 2014-04-09 16:01:20 +0200
> By: Esteban Lorenzano <estebanlm at gmail.com>
> SunOS merkur 5.11 illumos-b6240e8 i86pc i386 i86pc
> plugin path: /home/andreas/bin/pharo-without-higher-priority/ [default:
> /home/andreas/bin/pharo-without-higher-priority/]
>
>
> What does this tell?
>

That you're using the JIT (NBCogit & NBCoInterpreter).


> I started the VM with —trace. The last log is
>> „IRBytecodeGenerator>>from:goto“.
>> The pipe is still rotating but ALT-. does not break the loop (maybe a
>> problem of my Pharo image?; I will try later with a Squeak image).
>>
>>
>>  I tried to replace the pthread_setschedparam call with a similar
>>>> pthread_setschedprio call but
>>>> with no luck (same problem: failed call with "Not owner"). I don’t know
>>>> wether this is a general problem with the pthreads implementation
>>>> on Solaris or just a problem with the gcc version (4.4.4) coming with
>>>> the openindiana distribution I am using. Maybe this works only
>>>> with the compilers and libraries that is delivered by Oracle (Solaris
>>>> 10 ships with gcc 3.4.3; Solaris Studio has its own compilers).
>>>>
>>>
>>> It's too do with pthreads.  Nothing to do with the compiler.  On some
>>> implementations it requires special permission to create threads with
>>> different priorities.  That used to be the case on linux and it appears to
>>> be the case on OpenSolaris.  Hence one is stuck with the itimer heartbeat.
>>>
>>> Is there any implementation actually using sqUnixITimerHeartbeat.c?
>>>
>>
>> Yes, but unhappily.  We use it at Cadence because we have customers on
>> pre 2.6.12 kernels.  We have to e.g. switch off the heartbeast around
>> certain external calls.
>>
>> I am still wondering about where the necessary sleep call will be
>> generated in this case. I will check your latest VM sources. Maybe PharoVM
>> is different here…
>>
>
> Where is there a necessary sleep?
>
> My understanding is that the interrupt handler for the heartbeat is
> waiting for SIGALRM. Typically this is emitted by an expiring usleep or
> nanosleep call. I cannot see one in the code that is active
> when compiling the pharo-vm with ITIMER_HEARTBEAT flag set. In case that
> VM_TICKER flag is set there is an nanosleep call in the corresponding code.
>

But SIGALRM is also delivered by setitimer, as in

...
# define THE_ITIMER ITIMER_REAL
# define ITIMER_SIGNAL SIGALRM
...
if (setitimer(THE_ITIMER, &pulse, &pulse)) {

in platforms/unix/vm/sqUnixITimerHeartbeat.c.  I'm surprised this doesn't
work on OpenSolaris.  In fact, I can't believe it doesn't work.  Something
odd must be going on.


> The fact that my ITIMER_HEARTBEAT version is running when external
> SIGALRM’s being triggered confirms my view that a source for this signal is
> missing.
>

We;;, the source is there (the call to setitimer) but for some reason
something is going wrong.  I suspect the signal handler fires only once.
 SysV unixes need a signal handler to be rearmed with a call to signal or
sigaction in the handler.  This is daft, soby default sigaction avoids
having to rearm (saving a costly system call on every signal).  The old
behaviour can be reinstated using SA_RESETHAND.  Perhaps on OpenSolaris one
has to explicitly use a flag that is the converse of SA_RESETHAND that says
"don't reset the handler after delivery".

Here's the relevant excerpt from the Mac OS X manual page for sigaction:
"SA_RESETHAND    If this bit is set, the handler is reset back to SIG_DFL
at the moment the signal is delivered."

HTH,
-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20140424/490ccdc4/attachment-0001.htm


More information about the Vm-dev mailing list