[Vm-dev] Suspend

John M McIntosh johnmci at smalltalkconsulting.com
Wed Jul 18 18:54:51 UTC 2007


On Jul 18, 2007, at 10:01 AM, Ian Piumarta wrote:

> This started many years ago when something surreptitiously changed  
> in the image and/or Interpreter to cause relinquishProcessor to be  
> called with very small arguments (around the millisecond mark).   
> This is undoubtedly essential for good performance on some  
> platform, somewhere, but on Unix it is a disaster; there is no  
> portable way to sleep for such little time while also responding to  
> changes on descriptors in a timely (read: immediate) fashion.   
> Depending on the make and model of your kernel, a sub-timeslice  
> timeout in select() is either rounded up (maybe implicitly by the  
> process scheduler) to an unpredictable (but almost always large)  
> fraction a timeslice, or it is quantized down to zero.  The first  
> causes the famous Delay inaccuracies, the second causes the famous  
> 100% CPU usage.  That's the reason for the byzantine checks and  
> adjustments of the timeout argument that someone commented on a few  
> weeks ago.

Ok well let me give a bit of history here. The call as coded by John  
Maloney in 97 says

idleProcess
	"A default background process which is invisible."

	[true] whileTrue:
		[self relinquishProcessorForMicroseconds: 1000]

It had occurred to me that we should be able to sleep upto the next  
wakeup tick if you ignored the issue of incoming interrupts. Since  
incoming interrupts would terminate the sleep this is not an issue.

In the late 90's I changed the logic here to go to the Delay class  
and calculate where the next wakeup tick was to provide a different  
value than 1000.  This was pushed out into the update stream and  
lasted about an hour when Scott Wallace found out the hard way by  
toasting his day's work that on restarting an image and if everything  
was correct you would enter a deadly embrace in the Delay logic which  
would make the idleProcess unrunnable and the process schedular quits  
because no process is runable.

In re-evaluating this I pushed the logic into the VM, so in the Mac  
VM I have

    setInterruptCheckCounter(0);
     now = (ioMSecs() & MillisecondClockMask);
     if (getNextWakeupTick() <= now)
         if (getNextWakeupTick() == 0)
             realTimeToWait = 16;    <==========  could be higher I  
guess, likely it's never zero tho, actually I doubt getNextWakeupTick 
() is ever zero.
         else {
             return 0;
     }
     else
         realTimeToWait = getNextWakeupTick() - now;

	aioSleep(realTimeToWait*1000);


At some point in the past the UNIX code read
sqInt ioRelinquishProcessorForMicroseconds(sqInt us)
{
   int nwt= getNextWakeupTick();
   int ms=  0;

   if (nwt)
     {
       int now= (ioMSecs() & 0x1fffffff);
       ms= ((nwt <= now) ? (1000/60) : nwt - now);
     }

   if (ms < (1000/60))		/* < 1 timeslice? */
     {
#    if defined(__MACH__)	/* can sleep with 1ms resolution */
       if (!aioPoll(0))
	{
	  struct timespec rqtp= { 0, ms * 1000*1000 };
	  struct timespec rmtp;
	  while ((nanosleep(&rqtp, &rmtp) < 0) && (errno == EINTR))
	    rqtp= rmtp;
	}
#    endif
       ms= 0;			/* poll but don't block */
     }
   dpy->ioRelinquishProcessorForMicroseconds(ms*1000);
   setInterruptCheckCounter(0);
   return 0;
}


But currently it reads this below which takes the bogus 1000  
microsecond value, thus waking up more often and not really sleeping  
much.

I'm not sure why you can't again do the getNextWakeupTick()  
calcuation, or was there some other problem that was being hidden here?
Perhaps the Unix system wouldn't properly service sleep times < 100  
ms? Could it be a startup parm to turn on or off the logic?

Current Unix Code:

sqInt ioRelinquishProcessorForMicroseconds(sqInt us)
{
   int now;
   dpy->ioRelinquishProcessorForMicroseconds(us);
   now= ioLowResMSecs();
   if (now - lastInterruptCheck > (1000/25))	/* avoid thrashing intr  
checks from 1ms loop in idle proc  */
     {
       setInterruptCheckCounter(-1000);	/* ensure timely poll for  
semaphore activity */
       lastInterruptCheck= now;
     }
   return 0;
}

X11 display has

static sqInt display_ioRelinquishProcessorForMicroseconds(sqInt  
microSeconds)
{
   aioSleep(handleEvents() ? 0 : microSeconds);
   return 0;
}


/* sleep for microSeconds or until i/o becomes possible, avoiding
    sleeping in select() is timeout too small */

int aioSleep(int microSeconds)
{
#if defined(HAVE_NANOSLEEP)
   if (microSeconds < (1000000/60))	/* < 1 timeslice? */
     {
       if (!aioPoll(0))
	{
	  struct timespec rqtp= { 0, microSeconds * 1000 };
	  struct timespec rmtp;
	  nanosleep(&rqtp, &rmtp);
	  microSeconds= 0;			/* poll but don't block */
	}
     }
#endif
   return aioPoll(microSeconds);
}





>
>> What would it take to change the VM and Squeak to make it truly  
>> event driven?
>
> First try the notimer thing.  If that doesn't work, try multiplying  
> the argument to ioRelinquishProcessor by 100.  If that doesn't  
> work, we have to resort to science and engineering: profile the VM  
> and find out empirically where it spends its time while sitting  
> idle at 2% for an hour or two.
>
> HTH,
> Ian
>
>
>

--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===




More information about the Vm-dev mailing list