<div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, 8 Oct 2018 at 02:08, <<a href="mailto:commits@source.squeak.org" target="_blank">commits@source.squeak.org</a>> wrote:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

ClementBera uploaded a new version of VMMaker to project VM Maker:<br><br>

  SpurSelectiveCompactor compacts memory by selecting the memory segments with the most free space and compacting only those, to limit fragmentation while being really quick to perform. The algorithm is fast mostly because it does not update pointers: they are updated lazily during the next marking phase, so there is no need to read the fields of objects in other memory segments that the one compacted.<br>

<br>

  The algorithm works as follow. First, a global sweep pass iterates over the memory linearly, changing unmarked objects to free chunks and concatenating free chunks. During the global sweep phase, the segments of the heap are analysed to determine the percentage of occupation. Second, the least occupied segments are compacted by copying the remaining live objects into an entirely free segment, called regionToFill (we detail later in the paragraph where regionToFill comes from), changing their values to forwarding objects and marking the free chunks as unavailable (removed from free list and marked as data objects). Third, the next marking phase removes all forwarders. Fourth, at the beginning of the next compaction phase the compacted segments from the previous GC can be entirely marked as free space (No need to check anything inside, there were only forwarders and trash data). One of the compacted segment is then selected as the segmentToFill, others are just marked as free chunks.<br>

<br>

<br>

  The compaction is effectively partial, compacting only the most critical segments of the heap to limit fragmentation. Compaction time is crazy low, since a low number of objects are moved and pointer updated is lazily done during the next marking phase, while still preventing memory fragmentation.<br>

<br>

  Now this works well when biasForGC is true, but when performing a snapshot, the compactor is just total crap (we need to figure out a solution).<br>

<br>

  Performance trick for lilliputian chunks:<br>

+ Specific free chunks (called lilliputian, see isLilliputianSize:) are managed using a single linked list instead of a double linked list since there's not enough room in the free chunk for the back pointer. During the sweep phase this is not a problem since we're rebuilding the free chunk structure, but during selective compaction we're detaching free chunks from the free chunk structure and that can be horribly slow (10 seconds sometimes at 20Gb heap due to many iteration over the single linked list). To work around this problem, the sweep phase use lastLilliputianChunk variable to sort the lilliputian free chunk single linked list in ascending address order (See interceptAddFreeChunkWithBytes:at:). During the selective compation phase, the same variable is re-used to iterate at most once over the single linked list while detaching lilliputian chunks (See incrementalUnlinkSmallChunk:). In addition, each segment is annotated during the sweep phase with the last lilliputian chunk it<br>

  holds. Hence, during the compaction phase, the linked list is iterated but the iteration can jump to the last chunk of the previous segment to compact.!<br></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

- Specific free chunks (called lilliputian, see isLilliputianSize:) are managed using a single linked list instead of a double linked list since there's not enough room in the free chunk for the back pointer. During the sweep phase this is not a problem since we're rebuilding the free chunk structure, but during selective compaction we're detaching free chunks from the free chunk structure and that can be horribly slow (10 seconds sometimes at 20Gb heap due to many iteration over the single linked list). To work around this problem, the sweep phase use lastLilliputianChunk variable to sort the lilliputian free chunk single linked list in ascending address order (See interceptAddFreeChunkWithBytes:at:). During the selective compation phase, the same variable is re-used to iterate at most once over the single linked list while detaching lilliputian chunks (See incrementalUnlinkSmallChunk:).<br></blockquote><div><br></div><div>Please excuse me that I only skimmed the implementation and so don't have a good grasp if what I'm saying is practical,</div><div>but anyway reading the above description a random idea floating along the wind found me... </div><div><br></div><div>Have you considered per-segment-Lilliputian-list?</div><div>Reserve the first word of each segment as the pointer to the segment's first free-Lilliputian-chunk.  If it is null there are no free-Lilliputian-chunks in that segment. </div><div>The global sweep easily(?) knows which segment a Lilliputian-free-chunk belongs to and prepends to that per-segment list.</div><div><br></div><div>To combine all those separate per-segment lists into an overall-list...</div><div>split that reserved word into two parts... </div><div>* the LSB part provides an "offset" to the first element of the per-segment-list</div><div>* the MSB part provides a pointer to the next segment with a per-segment-list i.e. "next-per-segment-list". </div><div>Also btw, Lilliputian-free-chunks only require space to hold the "next-offset" within its segment</div><div><br></div><div>To allocate from the overall-list...</div><div>  given a root pointer "X", follow to the first segment,</div><div>  allocate chunk at "offset",<br>  update "offset" with "next-offset",</div><div>  if "offset" is zero, update "X" with "next-per-segment-list"</div><div><br></div><div>Reserve a second word at the start of each segment and you have a back pointer to the previous segment.</div><div>So when a segment is removed, its easy to remove all its Lilliputian-free-chunks out of the overall-list in one go.</div><div>The Lilliputian-free-chunk-overall-list would be like a double-linked-list with offshoots.</div><div><br></div><div>Maybe(?) then the Lilliputian never needs to be sorted, </div><div>and also doesn't need to be traversed when freeing a segment, just remove the segment from the overall-list.<br></div><div><br></div><div>cheers -ben</div><div><br></div><div><br></div><div>P.S. Please feel free to play "idea"-whac-a-mole.   I've not put a lot of critical thinking into it.   It was easier to share than stop it bubbling about.</div><div><a href="https://en.wikipedia.org/wiki/Whac-A-Mole">https://en.wikipedia.org/wiki/Whac-A-Mole</a><br></div></div></div></div></div>