Hi Jens & Tom,

    ah, SpurImagePreener doesn't clone the mark stack so nothing needs to be done for them.  Only the remembered set needs to be shrunk.  But empty class pages could be discarded, and if you could figure out how to rehash every dictionary you could reorder class indices to compact the class table.

On Fri, Mar 24, 2023 at 11:03 AM Eliot Miranda <eliot.miranda@gmail.com> wrote:
Hi Tom,

On Fri, Mar 24, 2023 at 8:53 AM Tom Beckmann <tomjonabc@gmail.com> wrote:
 
Hi list,

we're on a bit of an adventure to try and find the minimum size of a Squeak image that can still run a stdio REPL. After narrowing it down to around 6MB, we noticed that SpaceTally reported ~3MB of objects (as opposed to the 6MB that were saved on-disk).

Cool; a fine effort.  i hope this leads to a small image running on a minheadless VM for command-line scripting.


After a further deep-dive (in which SqueakJS and later the VM simulator were of immense help), we found that there were 60 objects of class index 19, which took up 3MB of space in the .image file. After some digging, we eventually found out that class index 19 are from SpurMemoryManager>>sixtyFourBitLongsClassIndexPun. As we understand it the "pun" objects are internal clones of the built-in classes (such as WeakArray, Array, ...), to prevent them from being found by a user.

Almost.  Puns are used to separate heap objects Spur uses internally from "user" Smalltalk objects. These are:
the class table: a sparse table used to map the class indices in every Smalltalk object header into the relevant class object
the remembered set: the objects in old space that reference new objects and are hence roots for scavenging
the mark stack: the stack used to mark all old space objects that holds objects that are being scanned for unmarked objects to scan and mark
the ephemeron stack: the stack used to hold potentially triggerable ephemerons found during scan mark

Some of these objects look like raw data, some of them look like arrays.  But all of them should be invisible to Smalltalk.  So setting their class index to a pun hides them during allObjects, and allInstances.  You'll find that the first few class indices, 1, 2 & 4, are those for the immediate classes (e.g. {SmallInteger. Character. SmallFloat64} collect: #identityHash. Then you'll find that the lowest class identityHash is 32, of LargeNegativeInteger, hence:
    ((Smalltalk specialObjectsArray select: [:e| e isBehavior]) collect: [:b| {b identityHash. b}]) sort: [:aa :ab| aa first < ab first]

The class indices from 8 through 31 are used for puns.

We even managed to locate one of the larger class-index=19 objects with the help of Tom (WoC): the hiddenRootsObj contains in its 4099's slot the RememberedSet, which in our image was just over 1MB in size.

Now, we're wondering whether we can get closer to our goal of getting to the smallest possible on-disk image size (don't ask why, at this point it's more of a challenge...). Does the RememberedSet need to be persisted or could we (easily?) nil it before saving to disk? Are there other low hanging fruits in terms of VM-internal objects that could be freed during snapshot generation?

It must be persisted. But it doesn't need to be that big.  There is a tool in the VMMaker for eliminating this wasted space: SpurImagePreener.  I can't guarantee that it currently prunes the remembered set (I just checked; it doesn't; should I fix it or would you like to fix it? It might be empowering for me to leave it to you).

So you do e.g. SpurImagePreener new preenImage: 'trunk', and it outputs a hopefully shrunk trunk-preen.image. See SpurImagePreener's class comment.  If you look at SpurImagePreener>>#cloneObjects you'll see how to reduce the size of the remembered table (currently it only handles the free lists).  I'll fix the mark stack, as the format of pages on the mark stack is a bit tricky, but I'll leave it to you to fix the remembered set size.
    

Best,
Jens (jl) and Tom (tobe)

PS: A not-so-clean version of the minification process can be found here https://github.com/hpi-swa-lab/cloud-squeak
We're in the process of cleaning it up and might send out a proper announcement once it's pretty.

Super cool!

_,,,^..^,,,_
best, Eliot


--
_,,,^..^,,,_
best, Eliot