[Vm-dev] Suspend

Wed Jul 18 17:01:29 UTC 2007

Mike,

There are two things to investigate.  The first is running with '- 
notimer' to see if it's the millisecond clock interrupts that are  
keeping you at a few %.

The second may be much more complicated than the following suggests  
but here are some vague clues for you as I understand them.

> IIRC Squeak still does some polling (event tickler?).

While it's 'polling' the support code is happy to go to sleep (in  
select()) for as long as the image tells it to (or until something  
happens on a file descriptor: network, display, etc.).  The CPU  
should be pinned at 0.0% when idle (with no loss of UI or network  
reactivity) but not, I suspect, with the image/Interpreter behaving  
the way it does.

> This has also caused quite some pain for people running Squeak on a  
> server:
> Squeak will always stay in the working set using a few percent CPU  
> (=power!) constantly, even if it effectively has been idle for a  
> long time.

This started many years ago when something surreptitiously changed in  
the image and/or Interpreter to cause relinquishProcessor to be  
called with very small arguments (around the millisecond mark).  This  
is undoubtedly essential for good performance on some platform,  
somewhere, but on Unix it is a disaster; there is no portable way to  
sleep for such little time while also responding to changes on  
descriptors in a timely (read: immediate) fashion.  Depending on the  
make and model of your kernel, a sub-timeslice timeout in select() is  
either rounded up (maybe implicitly by the process scheduler) to an  
unpredictable (but almost always large) fraction a timeslice, or it  
is quantized down to zero.  The first causes the famous Delay  
inaccuracies, the second causes the famous 100% CPU usage.  That's  
the reason for the byzantine checks and adjustments of the timeout  
argument that someone commented on a few weeks ago.

> What would it take to change the VM and Squeak to make it truly  
> event driven?

First try the notimer thing.  If that doesn't work, try multiplying  
the argument to ioRelinquishProcessor by 100.  If that doesn't work,  
we have to resort to science and engineering: profile the VM and find  
out empirically where it spends its time while sitting idle at 2% for  
an hour or two.

HTH,
Ian