[Vm-dev] About the image

K. K. Subramaniam kksubbu.ml at gmail.com
Thu Aug 19 01:51:33 UTC 2010


On Tuesday 17 Aug 2010 4:57:32 pm stephane ducasse wrote:
> Another question was where can we get a description of the image format.
> I was planning to read the VM C code included the generated one.
I have attached the notes and an image dump program I wrote when I was 
studying the image from its bits.

HTH .. Subbu
-------------- next part --------------
Image format in Words (4-byte on i386)

     +------------------------------+
0000 | magic = 0x00001966 (6502)    |
     +------------------------------+
0004 | headerSize                   |
     +------------------------------+
0008 | dataSize                     |
     +------------------------------+
000C | oldBaseAddr                  |
     +------------------------------+
0010 | specialObjectsOop            |
     +------------------------------+
0014 | lastHash                     |
     +------------------------------+
0018 | savedWindowSize              |
     +------------------------------+
001C | fullScreenFlag               |
     +------------------------------+
0020 | extraVMMemory                |
     +------------------------------+
0024 | reserved                     |
     +------------------------------+
0028 | reserved                     |
     +------------------------------+
002C | reserved                     |
     +------------------------------+
0030 | reserved                     |
     +------------------------------+
0034 | reserved                     |
     +------------------------------+
0038 | reserved                     |
     +------------------------------+
003C | reserved                     |
     +------------------------------+

Squeak Memory
------------
Squeak allocates memory in one go and stores it in sqMemoryBase. All memory references
are byte indexed into this array. The byte index is called a oop. An oop of 0 is never
used.

An oop may refer to an integer or to a structure. An integer oop is an odd integer. The
value of an integer is encoded in the oop itself (oop >> 1). Structure oops are always even
and act as an byte index into object memory. A structure oop
may point to any of four types of memory structures in Squeak
- namedObject, arrayObject, wordsObject, bytesObject. The type of object is not
encoded in the oop and should be determined from the context of usage.
A symbol like #red would be stored in a named object. A symbol manager stores
the characters forming the name in a symbol table and returns a unique oop for the symbol.
An arrayObject is a fixed size table of other types of objects. A wordsObject contains
a list of machine-aligned words (32 bits long on x86) while a bytesObject contains a list
of 8-bit octets.  The exact number of words or bytes are determined from the context. Elements
within these lists are accessed by an index using primitiveObjectAt: and ObjectAt:Put:

Examples:
	namedObject - Class, Selector
	arrayObject - Collection
	wordsObject - Bitmap
	bytesObject - String, ByteCode

There is also a special object called a CompiledMethod whose structure is composed
of both objects and bytecode instructions.

Physically, each object is stored in heap with a 3-word header followed by the contents
of the object

                  +---------------------+-+-+
object_address :  |  Slots              |C|S|
                  +---------------------+-+-+
                  |  Class              |C|S|
                  +---------------------+-+-+
      oop ------> |  Base               |C|S|
                  +---------------------+-+-+
                  | slots                   |
                  +-------------------------+
                  | .....                   |
                  +-------------------------+

Every object begins with a header whose size may vary from one to three words depending
on a type code stored in the least two significant bits in the header. The three words
contains Slots, Class and Base words.  Base word encodes basic
information about the object like its size, class, hash code. An object's oop always
points to the Base word. 

Headers are of four types. Type 0 contains Slots, Class and Base words. Type 1 contains
only Class and Base. Type 2 and Type 3 contain only Base. Type 2 is used for free blocks
and its Base word contains the slot count. Type 3 is a compact header which enocdes
the garbage collection status (2 bits), hash code (12 bits), class index (5 bits),
format (4 bits) and slot count (6 bits). The class index refers to an entry in the
CompactClass array which is stored in slot 29 of Special Objects array.

If C bit is 1, then oop is same as address. Otherwise, if bit S is 1, oop is offset
by one word and if bit S is 0, it is offset by two words.

Notice that the choice of bit positions yields us a quick way of computing byte offsets
from the size fields. Masking out all bits except the slot bits directly gives the size
in bytes. Masking out the type bits gives the size count in bytes for free blocks and
the class pointer for the Class word.

An image is a snapshot of object memory with a file header prefixed to it. The prefix header
is a 16 word structure:
	unsigned magic;  /* 6502 */
	unsigned headerSize;
	unsigned dataSize;
	unsigned oldBaseAddr;
	unsigned specialObjectsOop;
	unsigned lastHash;
	unsigned savedWindowSize; /* screen geometry 65536*x+y */
	unsigned extraVMMemory;
	...
On start, the header is read from the image file and a chunk of memory is allocated.
Its pointer is stored in a platform-specific variable. All references to memory
occur as byte-aligned offsets to this base address.

Oops are stored as virtual memory pointers in the image file but you can recover
the oop by masking out the header types and subtracting oldBaseAddr.

	oop = (pointer & ~3) - oldBaseAddr

The offset of an object within an image file is simply its oop added to BaseHeaderSize.
The word found at this offset gives the object header.  The class, size and the contents
of the object can be calculated from this header.

The file header contains a special object table pointer. This table contains
objects required to resume a suspended image. The special objects table also holds
objects which are required but may not be referenced from any other object.
Any live object in the image is either in the special objects table or traceable to one
of these.

Here is a decimal dump of the initial part of an image file:

	0000000       6502         64   22570156 3039039488
	0000016 3047769516      20199   67109632          0
	0000032          0          0          0          0
	0000048          0          0          0          0
	0000064 3039226449  503316485 3039226941  413138949
	0000080 3039226777  386662405    3703055 3043869072

The oldBaseAddr is the fourth word and the special objects pointer is the fifth word in the image header.
Its oop is:
	3047769516 - 3039039488 = 8730028

If we dump values at offset 8730028+64, we see:
    --->8730092  278934223 3039039492 3039039500 3039039508
	8730108 3039039512 3039249552 3039235932 3041442948
	8730124 3039248976 3039289880 3039235764 3039270752
	8730140 3039270588 3039267384 3039236016 3039692260
	8730156 3039238564 3039250196 3061421048 3039251180
	8730172 3039236264 3043868976 3043869004 3039039492
	8730188 3043841428 3047769728 3043869028 3039250112
	8730204 3040307748 3039040828 3039039492 3061420400
	8730220 3047770756 3047770768 3047770776 3044145908
	8730236 3039141192 3039039492 3039141348 3061420312
	8730252 3039224540 3039250360 3039289948 3039236180
	8730268 3039039492 3039039492 3039039492 3039039492
	8730284 3039039492 3044366488 3044479696       1028

The header word at this address, 278934223, is:
	0b000-100001010000-00011-0010-110011-11
                           cidx  fmt  size   type
which decodes into a single header word object of size 51 slots. That is, one header
followed by 50 oops. The class is 3rd entry in the compact class table. The oops stored in the order
Nil, False, True, SchedulerAssociation, .....and so on.  We can take each one of these
pointers, convert them into oop offset and parse the header word found at the oop location:

magic              : 6502
headerSize         : 64
dataSize           : 22570156
oldBaseAddr        : 0xb5241000
specialObjectsOop  : oop:8730028
lastHash           : 20199
savedWindowSize    : 1024x768
extraVMMemory      : 0
compact classes             oop:    1340 htype:3 cls:  209488 hash: 701 fmt: 1 gc:0 slots:  3
Specials:
  0NilObject                oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
  1FalseObject              oop:      12 htype:1 cls:  187452 hash:3152 fmt: 0 gc:0 slots:  5
  2TrueObject               oop:      20 htype:1 cls:  187288 hash:2950 fmt: 0 gc:0 slots:  5
  3SchedulerAssociation     oop:      24 htype:3 cls:  197104 hash:  28 fmt: 4 gc:0 slots: 15
  4ClassBitmap              oop:  210064 htype:1 cls:  209952 hash:3418 fmt: 0 gc:0 slots: 49
  5ClassInteger             oop:  196444 htype:1 cls:  196412 hash:2390 fmt: 0 gc:0 slots: 49
  6ClassString              oop: 2403460 htype:1 cls: 2403428 hash:1801 fmt: 0 gc:0 slots: 49
  7ClassArray               oop:  209488 htype:1 cls:  209456 hash:2455 fmt: 0 gc:0 slots: 49
  8SystemDictionary         oop:  250392 htype:1 cls:  250332 hash:1224 fmt: 0 gc:0 slots: 17
  9ClassFloat               oop:  196276 htype:1 cls:  195972 hash:3127 fmt: 0 gc:0 slots: 49
 10ClassMethodContext       oop:  231264 htype:1 cls:  231152 hash: 568 fmt: 0 gc:0 slots: 49
 11ClassBlockContext        oop:  231100 htype:1 cls:  230988 hash:3151 fmt: 0 gc:0 slots: 49
 12ClassPoint               oop:  227896 htype:1 cls:  227784 hash:1044 fmt: 0 gc:0 slots: 49
 13ClassLargePositiveIntegeroop:  196528 htype:1 cls:  196496 hash:2741 fmt: 0 gc:0 slots: 49
 14Display                  oop:  652772 htype:1 cls:  652720 hash: 834 fmt: 0 gc:0 slots: 33
 15ClassMessage             oop:  199076 htype:1 cls:  198964 hash: 876 fmt: 0 gc:0 slots: 49
 16ClassCompiledMethod      oop:  210708 htype:1 cls:  210676 hash:3116 fmt: 0 gc:0 slots: 49
 17TheLowSpaceSemaphore     oop:22381560 htype:1 cls:  211692 hash:1036 fmt: 0 gc:0 slots: 17
 18ClassSemaphore           oop:  211692 htype:1 cls:  211580 hash:3737 fmt: 0 gc:0 slots: 49
 19ClassCharacter           oop:  196776 htype:1 cls:  196744 hash:2163 fmt: 0 gc:0 slots: 49
 20SelectorDoesNotUnderstandoop: 4829488 htype:1 cls: 4829392 hash:2385 fmt: 0 gc:0 slots: 25
 21SelectorCannotReturn     oop: 4829516 htype:1 cls: 4829392 hash:3461 fmt: 0 gc:0 slots: 21
 22ProcessSignalingLowSpace oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 23SpecialSelectors         oop: 4801940 htype:0 cls:  209488 hash: 927 fmt: 1 gc:0 slots: 65
 24CharacterTable           oop: 8730240 htype:0 cls:  209488 hash:2554 fmt: 1 gc:0 slots:257
 25SelectorMustBeBoolean    oop: 4829540 htype:1 cls: 4829392 hash:1090 fmt: 0 gc:0 slots: 21
 26ClassByteArray           oop:  210624 htype:1 cls:  210512 hash:2391 fmt: 0 gc:0 slots: 49
 27Process                  oop: 1268260 htype:1 cls: 1267956 hash:2080 fmt: 0 gc:0 slots: 49
 28CompactClasses           oop:    1340 htype:3 cls:  209488 hash: 701 fmt: 1 gc:0 slots:  3
 29TheTimerSemaphore        oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 30TheInterruptSemaphore    oop:22380912 htype:1 cls:  211692 hash:1613 fmt: 0 gc:0 slots: 17
 31Unknown                  oop: 8731268 htype:3 cls:  196276 hash:2827 fmt: 3 gc:0 slots: 15
 32Unknown                  oop: 8731280 htype:3 cls:  196528 hash:1288 fmt: 2 gc:0 slots: 11
 33Unknown                  oop: 8731288 htype:3 cls:  227896 hash:2177 fmt: 4 gc:0 slots: 15
 34SelectorCannotInterpret  oop: 5106420 htype:1 cls: 4829392 hash: 444 fmt: 0 gc:0 slots: 21
 35Unknown                  oop:  101704 htype:3 cls:  231264 hash:3984 fmt: 7 gc:0 slots: 31
 36Unknown                  oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 37Unknown                  oop:  101860 htype:3 cls:  231100 hash:3214 fmt: 6 gc:0 slots: 31
 38ExternalObjectsArray     oop:22380824 htype:3 cls:  209488 hash:1412 fmt: 1 gc:0 slots: 23
 39Unknown                  oop:  185052 htype:1 cls:  184940 hash:2320 fmt: 0 gc:0 slots: 49
 40Unknown                  oop:  210872 htype:1 cls:  210760 hash:2443 fmt: 0 gc:0 slots: 49
 41TheFinalizationSemaphore oop:  250460 htype:1 cls:  211692 hash:4043 fmt: 0 gc:0 slots: 17
 42ClassLargeNegativeIntegeroop:  196692 htype:1 cls:  196580 hash:3323 fmt: 0 gc:0 slots: 49
 43ClassExternalAddress     oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 44ClassExternalStructure   oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 45ClassExternalData        oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 46ClassExternalFunction    oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 47ClassExternalLibrary     oop:       4 htype:1 cls:  186960 hash:3840 fmt: 0 gc:0 slots:  5
 48SelectorAboutToReturn    oop: 5327000 htype:1 cls: 4829392 hash: 358 fmt: 0 gc:0 slots: 29
 49SelectorRunWithIn        oop: 5440208 htype:1 cls: 4829392 hash:1749 fmt: 0 gc:0 slots: 17

When Squeak resumes an image, it starts with the SchedulerAssocation object stored in
the special objects table. In our case, this object begins at oop 24:

	0000088    3703055 3043869072 3039045228        260
                                      ^^^^ scheduler
	0000104 3039248976  448541184 3043888564          3

The second slot after the header contains the active processor object.
	3039045228 - 3039039488 = 5740
If we dump file contents at this offset 5740+64, we see

	0005804  282460429 3039718060 3060891548
                           ^^^^^^^^^^ active context
The object has two pointers following the  header. The first pointer is
the active context:
	3039718060 - 3039039488 = 678572

if we dump the active context object from the file, we see
	0678636    4862464 3039045244 3039045260 3039045276
                           ^^sender   ^^ip       ^sp
	0678652 3039045292 3039045308 3039045324 3039045340
                ^^method   ^^rcvmap   ^^receiver
	0678668 3039718388 3039718404 3039718420 3039718436

The context contains Sender, InstructionPointer, StackPointer, Method, ReceiverMap and Receiver.
Here is a summary dump of the whole scheduling context:
scheduler                   oop:      24 htype:3 cls:  197104 hash:  28 fmt: 4 gc:0 slots: 15
processor                   oop:    5740 htype:1 cls:  207640 hash:2155 fmt: 0 gc:0 slots: 13
context                     oop:  678572 htype:0 cls:  209488 hash:  37 fmt: 1 gc:0 slots: 81
 sender                     oop:    5756 htype:1 cls:  211528 hash:2556 fmt: 0 gc:0 slots: 13
 IP                         oop:    5772 htype:1 cls:  211528 hash:2754 fmt: 0 gc:0 slots: 13
 SP                         oop:    5788 htype:1 cls:  211528 hash:1448 fmt: 0 gc:0 slots: 13
 Method                     oop:    5804 htype:1 cls:  211528 hash:3490 fmt: 0 gc:0 slots: 13
 receiverMap                oop:    5820 htype:1 cls:  211528 hash:2498 fmt: 0 gc:0 slots: 13
 Receiver                   oop:    5836 htype:1 cls:  211528 hash: 517 fmt: 0 gc:0 slots: 13

The Method pointer leads to a Method object containing the bytecodes to be interpreted
and the InstructionPointer gives the current index into this bytecode array.
These are used to setup the initial execution context for the Interpreter
and its fetch-execute loop:

	increment IP and fetch next bytecode at IP.
	interpret bytecode

Basic Classes
------------
System ST80 Services
Files Monticello
Kernel Collections Compression Compiler
Exceptions
Network 
Graphics Morphic MorphicExtras
Multilingual 
PackageInfo Protocols 
Tools ToolBuilder
Trait
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sq.c
Type: text/x-csrc
Size: 5874 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20100819/f59a295b/sq-0001.c


More information about the Vm-dev mailing list