[Vm-dev] Suspend
Ian Piumarta
piumarta at speakeasy.net
Wed Jul 18 17:01:29 UTC 2007
Mike,
There are two things to investigate. The first is running with '-
notimer' to see if it's the millisecond clock interrupts that are
keeping you at a few %.
The second may be much more complicated than the following suggests
but here are some vague clues for you as I understand them.
> IIRC Squeak still does some polling (event tickler?).
While it's 'polling' the support code is happy to go to sleep (in
select()) for as long as the image tells it to (or until something
happens on a file descriptor: network, display, etc.). The CPU
should be pinned at 0.0% when idle (with no loss of UI or network
reactivity) but not, I suspect, with the image/Interpreter behaving
the way it does.
> This has also caused quite some pain for people running Squeak on a
> server:
> Squeak will always stay in the working set using a few percent CPU
> (=power!) constantly, even if it effectively has been idle for a
> long time.
This started many years ago when something surreptitiously changed in
the image and/or Interpreter to cause relinquishProcessor to be
called with very small arguments (around the millisecond mark). This
is undoubtedly essential for good performance on some platform,
somewhere, but on Unix it is a disaster; there is no portable way to
sleep for such little time while also responding to changes on
descriptors in a timely (read: immediate) fashion. Depending on the
make and model of your kernel, a sub-timeslice timeout in select() is
either rounded up (maybe implicitly by the process scheduler) to an
unpredictable (but almost always large) fraction a timeslice, or it
is quantized down to zero. The first causes the famous Delay
inaccuracies, the second causes the famous 100% CPU usage. That's
the reason for the byzantine checks and adjustments of the timeout
argument that someone commented on a few weeks ago.
> What would it take to change the VM and Squeak to make it truly
> event driven?
First try the notimer thing. If that doesn't work, try multiplying
the argument to ioRelinquishProcessor by 100. If that doesn't work,
we have to resort to science and engineering: profile the VM and find
out empirically where it spends its time while sitting idle at 2% for
an hour or two.
HTH,
Ian
More information about the Vm-dev
mailing list