[squeak-dev] Low-space considerations ... :-)

Wed Jul 13 08:30:55 UTC 2022

Hi Eliot --

> 'nuff said?

:-) This will help me to better understand the issue. Thanks. The fact is,
that the current VM will easily crash when you stress an endless recursion.

Let me sum up the issues I discovered so far:

1. "OutOfMemory signal" is incompatible with #lowSpaceWatcher because
   its #defaultAction omits to set special object 23.
2. #lowSpaceWatcher cannot deal with cases where special object 23 was
   not set. Using "Processor preemptedProcess" might help.
3. #lowSpaceWatcher uses the path for user-interrupt debuggers, which
   has a lower bound of process priority 40. So, memory issues in background
   processes get not caught/interrupted. I will fix this now.
4. An endless recursion can crash the VM. An upper memory limit via VM
   parameter 67 will let this happen sooner. The #lowSpaceWatcher has no
   chance to suspend the erroneous Squeak process in time.

In VMMaker, I found two places where the LowSpaceSemaphore is 

signalled:
   StackInterpreter >> #checkForEventsMayContextSwitch:
   NewObjectMemory >> #allocateChunkAfterGC:

I have no clue why StackInterpreter >> #activeProcess does not seem to
work properly. The interrupted process cannot be accessed via special
object 23 in the #lowSpaceWatcher.

There might be an extra limit missing the avoid a VM crash during endless
recursion. We cannot enforce the upper limit from parameter 67 if we still
need some more frames to let the #lowSpaceWatcher do its work.

Best,
Marcel

Am 12.07.2022 18:58:19 schrieb Eliot Miranda <eliot.miranda at gmail.com>:
Hi Marcel,

On Mon, Jul 11, 2022 at 11:15 PM Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]> wrote:

Aha. Via #handleFailingFailingBasicNew:, OutOfMemory >> #defaultAction signals the LowSpaceSemaphore. And at that point, special object 23 was not yet set. Hmm....

  the VM sets Smalltalk specialObjects at: 23 and signals the low space semaphore when it detects that heap space is low.  This is complicated, but essentially Smalltalk will have asked for some allocation to be done, and typically only a large allocation will trip the low space limit, but a cumulative series of small allocations *that are held onto and not reclaimable via a scavenge* can also trip the level.  A flag in the vm (needGCFlag) will get set, and  the VM's "you need to respond to events" flag gets set by the allocation that failed and/or by the allocation that tripped the new space max size threshold.  Then, in checking for events, the VM runs the garbage collector (the scavenger).  At this point it may see that available space is low and then there is a fork in the road.

If there is no old space size limit in effect, the VM will try and grow old space by adding a new segment.  This is a path that leads quickly to the VM growing memory to a level where the system swaps itself into unresponsiveness.

If there is a size limit in effect then the VM won't be able to grow old space, will find that the total ammount of free space is below the low space threshold, and will signal the low space semaphore and set Smalltalk specialObjects at: 23 to the process that is active.

Sorry this is so complicated; it "was always like this".  

Executive summary, test the system with an old space size limit in place (Smalltalk vmParameterAt: 67, look for "#67 0 the maximum allowed size of old space (if zero there is no limit)" in the About Squeak->VM Parameters tab in the SystermReporter).  Then, if that doesn't work, come back and complain.  If it does work then we have some documenting to do to explain how this fragile and complex mechanism can work "if it's Thursday, in the winter months, and there are clouds in the west".

'nuff said?

Best,
Marcel
Am 12.07.2022 08:13:15 schrieb Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]>:
Hi Eliot --

> Our #lowSpaceWatcher seems to be broken because "Smalltalk specialObjectsArray at: 23" is always "nil".
>> [...] The vm sets it to the active process when it signals the low space semaphore.

And that's not working at the moment. Even after the LowSpaceSemaphore got signaled:

Smalltalk vmParameterAt: 67 put: 1 * 1024 * 1024 * 1024.
Array new: (0.125 * 1 * 1024 * 1024 * 1024) rounded.

One has to push CMD+Dot to then see that the #lowSpaceWatcher complains that  it cannot retrieve the preemptedProcess via special object 23.

> Has #primSignalAtBytesLeft: (primitive 125) any effect these days?
>> Yes.

Hmm... I can only find places that set it to 0. :-) ... Wait ... Ah! Nevermind. xD

>> So infinite recursion (as it always did) causes heap growth, and the issue of sluggishness did to paging comes to the fore.

Ah! That was my hypothesis. Yet, since the #lowSpaceWatcher seems to be broken, I wondered what was going on.

Thanks! :-)

Best,
Marcel

Am 12.07.2022 08:02:28 schrieb Eliot Miranda <eliot.miranda at gmail.com [mailto:eliot.miranda at gmail.com]>:
Hi Marcel,

On Jul 11, 2022, at 6:15 AM, Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]> wrote:

Hi all --

What's the current state of our low-space guards in Squeak? That OutOfMemory error only works with larger allocations, right?

Our #lowSpaceWatcher seems to be broken because "Smalltalk specialObjectsArray at: 23" is always "nil". Should we replace it with "Processor preemptedProcecss"?

No, it *should* be nil.  The vm sets it to the active process when it signals the low space semaphore.  Hence "Smalltalk specialObjectsArray at: 23" reliably informs the image level low space code which process performed the allocation that tripped the low space trigger.

Has #lowSpaceThreshold any effect these days?

Yes.  But given that Spur attempts to grow the heap dynamically and contemporary OSs are only too happy to comply, the low space trigger may only be seen to work if a limit on old space size has been installed.  Otherwise it’ll most likely not trigger before the system starts to page itself into sluggishness.

Has #primSignalAtBytesLeft: (primitive 125) any effect these days?

Yes.

Can there ever be an image-level protection against stack-page shortage? VM parameter 42 seems to be 50 (num stack pages), which lasts for quite some time in endless recursion ... but CMD+Dot stops working at some point because of some stack-growing issues? And then the VM crashes eventually.

Again that’s not how things work.  The stack zone is fixed size.  When a new stack page is needed the oldest page is evacuated to the heap in the form of Context objects.  So infinite recursion (as it always did) causes heap growth, and the issue of sluggishness did to paging comes to the fore.

Best,
Marcel

--

_,,,^..^,,,_

best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220713/98ce38aa/attachment-0001.html>