[Vm-dev] image format suggestion (was: blog post)

Fri Sep 13 00:03:53 UTC 2013

Eliot,
>     I've just written a blog post on progress with Spur, which is my nickname
> for a new memory manager for the Cog VM.  I hope it'll be interesting.-- 

It is very interesting! Congratulations on your new design. Instead of
commenting on it, however, I would like to suggest that since this work
will probably force you to change the image file format anyway you might
consider making even more changes.

Given the current speeds of processors and disks, I don't think there is
much to be gained by having the file be as close to a pure memory dump
as possible. I have considered formats that require a little more
processing in my own designs. For Neo Smalltalk, for example, I wanted
to be able to load the same image into a 16 bit, a 32 bit and a 36 bit
processor. The encoding was based on a compact representation for
"infinite" lenght integers.

Here is a quick and dirty example (not what I actually used - I would
have to dig through my old emails to find the various designs): imagine
that four bits encode:

0, 1, 2, 3,
4, 5, 6, 7,
next 8 bits are the number,
next 16 bits are the number,
next 24 bits are the number,
next 32 bits are the number,
next 8 bits are the negative of the number,
next 4 bits encode -3 to -18
-,2 ,-1

In the case of "next 8 bits are the number" or "next 8 bits are the
negative of the number", when the 8 bits are 0 to 7 they are interpreted
as "the next N+1 bytes are the size of the number". So this can encode a
number which fills all of a 64 bit processors virtual memory with just
9.5 bytes of overhead.

Given a number like this, different processors might deal with them
differently when reading them in. The number 11 (encoded in 12 bits)
would be a SmallInteger when read by a 16 or 32 bit processor while the
number one million (encoded in 28 bits) would become a
LargePositiveInteger in the 16 bit processor but still a SmallInteger on
the 32 bit one.

Object pointers were encoded as the number indicating how many object
back to look in the object table. Most object pointers are backwards and
the scheme describes uses fewer bits for positive numbers. With direct
pointers instead of an object table you could just subtract the oop from
the current object's oop and divide by the number of bytes per word.

I also used a single very large number to represent the content of a
string or a bitmap.

Given that both oops and immediate values are represented in the same
way, objects that allow both have to indicate which is which. One option
is to add a single long integer with one tag bit per slot. to the
object's representation. For Spur you would need 2 (or more?) tag bits
per slot. The 36 bit version of Neo Smalltalk had 4 tag bits per slot,
so that is what the 16 and 32 bit versions used too.

These details don't matter - the important thing is that if it is
possible to have a single image file format for Spur that works in both
32 and 64 bit VMs it would be nice to do it.

-- Jecel