[Vm-dev] Maximum value of -stackpages VM parameter?

Thu Jun 15 20:46:55 UTC 2017

Excellent, I think this gives me the tools I'll need to get to the bottom
of the problem.

Thanks,
Phil

On Jun 15, 2017 4:33 PM, "Eliot Miranda" <eliot.miranda at gmail.com> wrote:

>
> Hi Phil,
>
>     via vmParameterAt: you'll access
>
> 60 number of stack page overflows since startup (read-only; Cog VMs only)
>
> a stack page overflow occurs when a computation that sends deeply enough
> fills a page with activations and to continue needs to extend onto a fresh
> page.  It is expected that this number be high.
>
> 61 number of stack page divorces since startup (read-only; Cog VMs only)
>
> a stack page divorce occurs when either a stack overflow or a process
> switch require a new page but all pages are in use and so the least
> recently used page is "divorced"; its activations are converted into
> context objects on the heap, emptying the page and allowing its reuse.
> This is the number we'd like to keep low by upping the number of stack
> pages, but not so much that we slow down GC.
>
> 68 the average number of live stack pages when scanned by GC (at
> scavenge/gc/become et al)
> 69 the maximum number of live stack pages when scanned by GC (at
> scavenge/gc/become et al)
>
> These two (sorry, just noticed they're not in the method comment, at least
> in Squeak) can be used to monitor how many stack pages are in use as the
> system runs.
>
> From these two we can tell whether a large number of pages leads to a high
> load on the scavenger scanning stack pages.  If the average is low while
> the number of stack pages is high then the application has a pattern that
> is insensitive to the number of stack pages and then one can increase the
> number of stack pages without seeing much GC overhead.  But I expect this
> is unlikely; these two were added to monitor GC performance at Cadence and
> indeed we see that increasing the number of stack pages in use also
> increases the average number of stack pages in use at GC time.
>
> That said, in my current VMMaker image, in this session merely used for
> browsing, I see
>
> #42 50 number of stack pages available   (default)
> #43 0 desired number of stack pages (i.e. select default)
>
> #60 89,370 number of stack page overflows since startup
> #61 0 number of stack page divorces since startup
>
> #68 11.35 the average number of live stack pages when scanned by
> scavenge/gc/become
> #69 16 the maximum number of live stack pages when scanned by
> scavenge/gc/become
>
> So in normal development use it looks like stack page use is minimal.
>
> On Thu, Jun 15, 2017 at 1:10 PM, Phil B <pbpublist at gmail.com> wrote:
>
>>
>> Eliot,
>>
>> Thanks for the tip, I'll give that a shot.  Also, is it possible to check
>> the amount of stack usage from the image? (I.e. just to get a rough idea of
>> where things stand that's reasonably fast)
>>
>> Phil
>>
>>
>> On Jun 12, 2017 7:24 PM, "Eliot Miranda" <eliot.miranda at gmail.com> wrote:
>>
>>
>> Hi Phil,
>>
>> On Jun 12, 2017, at 2:25 PM, Phil B <pbpublist at gmail.com> wrote:
>>
>> Eliot,
>>
>> Thanks for the info, that's good to know.  I probably should have been
>> explicit in that I am only bumping it up this high to troubleshoot a rather
>> annoying startup bug in my code. When it crashes as a result of the stack
>> overflow the trace is pretty useless (iirc, about 1/2 a page of INVALID
>> REFERENCE so I'm mostly flying blind.)  Bumping up the limit is allowing me
>> to get a better view of where things are going wrong and I plan to drop
>> back once I've resolved it.
>>
>>
>> A better way to debug thus will be to set a breakpoint in the scavenger
>> and the GC on every GC.  Stack overflow in a language like Smalltalk where
>> activations are objects means that the heap grows as the stack grows.  (The
>> stack pages in the stack zone can be seen as an allocation cache for the
>> most recent activations, reducing the pressure on the GC).  So if run under
>> gdb (lldb on Mac) and you print the stack in each GC you should be able to
>> at least see where the infinite recursion is coming from before the system
>> runs out of memory:
>>
>> (gdb) b doScavenge
>> breakpoint 1 set at NNNN
>> (gdb) commands 1
>> call printStackCallStackOf(framePointer)
>> end
>> (gdb) run myimage.image
>>
>> You can use
>> (gdb) call pushOutputFile("stack.log")
>> to get the vm to send subsequent output to a file and
>> (gdb) call popOutputFile()
>> to close the log.
>>
>>
>> Thanks,
>> Phil
>>
>> On Jun 12, 2017 4:43 PM, "Eliot Miranda" <eliot.miranda at gmail.com> wrote:
>>
>>
>> Hi Phil,
>>
>>
>> > On Jun 12, 2017, at 12:50 PM, Phil B <pbpublist at gmail.com> wrote:
>> >
>> > In trying to troubleshoot an issue, I needed to bump up the stackpages
>> parameter.  On 64-bit Linux, a value of 600 worked but 1000 segfaulted so I
>> was just wondering what the limit(s) are for it?
>>
>> There are no explicit limits.  The set fault you're seeing is as a result
>> of the stack pages being allocated on the c stack.  When the number is high
>> the stack overflows and boom.
>>
>> A word to the wise: too high a value and scavenging performance falls
>> (stack pages are implicitly roots into new space), and become performance
>> falls (all activations in stack space are scanned post become to avoid a
>> read barrier on inst var fetch).
>>
>> The default value was 192, a value chosen to exceed qwaq server process
>> usage, but both at Cadence and in Spur profiling we found that was not a
>> good value and pulled it back to 64 (IIRC).
>>
>> I'm curious as to why are you exploring such high values.
>>
>>
>>
>>
>>
>>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170615/8d2125f9/attachment-0001.html>