I'm really puzzled why you are working with such a VM; restricting to 16 bit OOPS is really very ancient technology. Is this some old version of LittleSmalltalk or something? Why not use a more modern system; ie Squeak ?
I while back I was toying with a VM that used 16-bit object pointers with a LOOM style translation to a much larger object space. The thought was perhaps the speed increase from having the active object working set always in L1/l2 processor cache would offset the overhead of periodically needing to shuffle objects between the small fast and large slower object spaces.
This thinking was right after doing a high speed networking project where performance REALLY was hurt by TLB misses, and the reality sunk in that that common systems can only do 5-10 million random memory accesses per second across a large memory space. This assumes a TLB miss, which requires 4 memory bus clocks of access latency, and 3 more memory bus clocks to finish a cache line burst, and then after loading the TLB entry (which was a L2 cache miss), the same 7 clocks to get the actual memory location, or a total of 14 bus clocks for the processor to read 4 bytes. At a typical memory bus clock speed of 100 Mhz, this ends up being a usable memory bandwidth of only about 28 Mbytes/sec. There is a critical ratio between how many TLB entries you have, total memory space, and the randomness of access. As I remember, having like 256k of L2 cache with 256 MBytes of total memory was beginning to show this problem, basically when ALL the TLB's don't fit in L2 cache. Now that memory costs $75/Gigabyte, and L2 caches are the same size as before, and processors are running at 2 GHz, this might be worth revisting.
This VM never got past the curious idea state, as I never got around to profiling a Smalltalk system to measure the randomness of memory accesses and the object working set size. I did have a name for the idea, which was a reduced address space architecture (RASA). It was just another variation of "make the common operations (access to objects in the object working set) go fast, in exchange for slowing down less common operations". It's possible I could now profile things using the Squeak VM simulator!
For a VM that has limited local resources (like a cheap microcontroller), and swaps objects across a network, I could imagine 16-bit object pointers with a LOOM virtual memory might be very attractive. Even a VM that has limited RAM but lots of slower flash (for most of the object space) might find this an attractive architecture.
In a perfect universe, all object pointers would be unique across the universe (128 bits is enough), and all that stuff we call files (local or remote) would just be referenced by some object pointer. Isn't this what Ted Nelson always wanted?
- Jan