[Vm-dev] Use less CPU (improve battery life or reduce cost in the cloud)
holger at freyther.de
Tue Aug 29 21:29:04 UTC 2017
I have done some early prototype for the Unix VM end of 2015(?) and I have improved and repeated these for MacOS now and the thread based heartbeat (now that it is the universal default). I won't make it to ESUG this year but this might be something to play with?
The motivations is simple: Polling increases the CPU usage which will reduce your battery life, takes away resources from other processes (e.g. more Pharo images) or these days increases your cloud computing bill. On top of that it might increase network latency (time from socket becoming readable to the time the semaphore is signaled).
To complete the work we have work inside the Image and the VM and some of it is on the way and others might need more discussion.
The idle process:
"A default background process which is invisible."
[self relinquishProcessorForMicroseconds: 1000]
Let's please yield the CPU for more than a 1ms. Unless I am missing something an expired Delay or network IO would make us wake up earlier anyway?
The delay scheduler:
The VM supports that when the next wake-up time is set to 0, the VM can sleep indefinitely. There is a pending patch to sleep "0" in our Delay scheduler. Currently we force a wake-up earlier than that. I think we should trust the VM to do wake us up even if it is a second away.
I don't understand the WorldState>>#interCyclePause but then I never looked at Morphic. Do we really need to poll like that? Under which circumstances does the world update? We get an event (where we have the event semaphore), we get some I/O (where we have a semaphore) or we have a timeout (where we sleep on a semaphore). Did anyone ever look at removing the tick?
Currently we receive a SIGIO but from what I can see (and I still need to write a benchmark) the processing might be delayed 20ms? My hack removes the usage of nextPollUsecs and instead checks a variable that is set by the SIGIO handler. Besides missing memory barriers this should work(tm).
The biggest issue seems that for macos/ios the input is driven by polling. E.g. some wheel events seem to require to pump the event queue. Is this something we could trigger from the image in the future? I had hoped to get a fd to a machport we could get SIGIO for.. but that doesn't seem to exist. I have hacked out the honoring of the relinquish delay, added the polling into a iOS specific routine and thanks to the Morphic Delay we bump the event loop frequently enough.
VM heartbeat thread:
The process keeps ticking even if the VM doesn't run. E.g. sleeps and waits for an event. There is a cost in deciding when to halt the thread so there must be a cut-off for which delays we bother to disable the heartbeat thread. I think the current code would allow the hearbeat to drift so the new code might just make it a bit worse.
Where are we now?
I have pushed my changes to https://github.com/zecke/opensmalltalk-vm/tree/mac-use-less-cpu and would be happy to have people look at it, look at the memory synchronization, maybe run to see if they notice extra delays or such?
I started the same image with the plain-vm and my hacked one and let it run for about 20min. The output is coming from top.
COMMAND %CPU TIME
Pharo 4.3 00:48.49
Pharo 0.8 00:10.20
Looking for comments and feedback.
More information about the Vm-dev