Persistence VM?

Mon Aug 26 04:36:33 UTC 2002

At 2:12 PM -0400 8/19/02, Stephen Pair wrote:
>I'm attempting to build a BerkeleyDB persistence solution for Squeak.  I
>had hoped that I could do this without modifying the VM...however, I'm
>coming to the conclusion that a few key modifications to the VM would
>make life much simpler.

This is a topic I find very interesting. I've never looked at 
BerkeleyDB, but I have done quite a bit of work with object 
persistence, and have been thinking in the past year about what VM 
changes would be useful for persistence.

My comments and questions are interspersed below. I hope you find them helpful.

-Martin

>
>This as much a brain dump for me to keep track of everything I need to
>think about as it is a solicitation for input.  I'm most interested in
>hearing about the implementation details from people familiar with
>Squeak's interpreter, and about the capilities this VM will provide from
>people that have implemented persistence frameworks.
>
>So, here are my thoughts:
>
>First, I would like call this VM the "Persistence VM" or something
>similar.  The idea being that it is for anyone needing incremental
>object persistence in Squeak.  The hope being that a lot of persistence
>implementors could take advantage of what this vm offers.  It will have
>performance and space implications, so it should remain separate from
>the main VM.

I'd really prefer for there to be a single VM that supports 
persistence and also is reasonable for folks that don't use 
persistence. Ideally, I think persistence should be something you 
don't really have to think about, but is just there as part of the 
toolset. I suspect you may be able to get by with fewer changes than 
you've proposed. If so, a single VM may be fine for both.

I suggest starting with the minimum VM changes required, omitting 
changes that are solely for performance. Later, some of the 
performance changes can be experimented with, but at least we'll have 
a baseline against which to measure any performance gain.

I'm assuming that you haven't yet implemented a persistence scheme 
that doesn't use these VM changes and worked with it enough to see 
where the performance bottlenecks are. If your suggested VM changes 
are the result of studying the bottlenecks in an existing 
implementation, then many of my suggestions do not apply.

[...]

>
>What this VM will provide: 
>
>1) the ability to set a flag for any object in the system that will
>prevent that object's state from being directly accessed (read or write)

I see the need to trap writes; you need to be able to tell when an 
object has been dirtied. I don't see any need to trap reads, other 
than LRU tracking. Checking a header bit on every instvar write will 
slow things down a bit, but checking every read will have a larger 
impact, since reads far outnumber writes. (If I recall correctly, 
that is. I haven't measured the read/write ratio myself.)

I'd suggest starting by only trapping writes. A while back I posted a 
proposal for this for the VM4 work. Throwing an exception on 
attempted write to a flagged object should be sufficient, though I'd 
need to review the details of Squeak's exception handling to be sure.

>
>2) the ability to associate a state manager object with any object in
>the system

You need to be able to attach state to any object, but this shouldn't 
require a VM change. A suitably optimized IdentityDictionary will do. 
Adding VM support for a state manager reference in the header might 
have better performance, but I wouldn't start with that. I don't like 
adding VM complexity unless there are commensurate demonstrated gains.

>
>3) any attempt to read or write a slot in an object that has the
>"managed state" flag will result in a read or write message being sent
>to that object's state manager (or to nil if none has been assigned)

I suspect that an exception on write is all that is needed here.

>
>4) if a state manager has, as its first slot, a SmallInteger, a 4 bit
>portion of that small integer will be reset to 16r0 whenever that state
>manager is sent the read or write message (persistence implementors
>could use this fact to age cached objects and implement an approximate
>LRU cache management scheme)

I'd leave this out of early versions. There are other possible cache 
management schemes, some of which are approximate LRU, that don't 
require VM changes, and I'd want to see how those performed before 
putting this into the VM.

>
>5) the ability to transform any object in the system into a "forwarder"
>which will transparently forward all messages it recieves to some other
>designated object...garbage collection will automatically change all
>references to forwarders into references to their target.  A full GC
>will eliminate all references to all forwarder objects in memory.  A
>Smalltalk>>allObjectsDo: will not visit "forwarder" objects and once an
>object is converted to a forwarder, it's state will no longer be
>accessible, nor will it be capable of receiving messages (all message
>sent to it will go to it's target)...identity comparisons will result in
>an identity comparison with the target object...the hash will be the
>hash of the target object (thus if the original object is in any hash
>sets, you might need to remove and re-insert it, or rehash the set).

I suggested something like this a while back, and I still suspect 
that it might be a good idea for applications that need to do many 
becomes. However, I'd rather see a persistence framework that doesn't 
heavily use become.

If the size of an object can be known at the point in time that a 
stub needs to be created (and a reasonable faulting scheme usually 
can be found that does give this knowledge) then the stub can be 
created to be the correct size. Then all that is needed is the 
ability to change the class of the object, and it can stay in the 
same place in memory, so pointers to it would not need to be 
corrected.

I don't remember if Squeak has the ability to change the class of an 
object. If not, that would need to be added, but that's considerably 
simpler than the forwarders.

>
>6) the ability to transform any object in the system into a "stub"
>object.  A stub object does not understand very much and when it
>receives a message, will ask another object to load the real object into
>memory, transform itself into a "forwarder" object, and then resend the
>message to itself (after the stub object has been transformed into a
>forwarder)

The in-place class change may be the fastest and simplest way to do this.

>
>Required VM changes:
>
>- compact classes will be dropped

This seems reasonable. A simpler header is a good thing, and the 
space costs don't seem all that bad.

>- the ability to set an object's identity hash from Smalltalk is
>needed..

Agreed.

[...]