Persistence VM?

Mon Aug 26 14:03:40 UTC 2002

Martin,

Thanks taking the time to give me feedback.  I've got a few comments
below.

> At 2:12 PM -0400 8/19/02, Stephen Pair wrote:
> >I'm attempting to build a BerkeleyDB persistence solution 
> for Squeak.  
> >I had hoped that I could do this without modifying the VM...however, 
> >I'm coming to the conclusion that a few key modifications to the VM 
> >would make life much simpler.
> 
> This is a topic I find very interesting. I've never looked at 
> BerkeleyDB, but I have done quite a bit of work with object 
> persistence, and have been thinking in the past year about what VM 
> changes would be useful for persistence.
> 
> My comments and questions are interspersed below. I hope you 
> find them helpful.
> 
> -Martin
> 
> >
> >This as much a brain dump for me to keep track of everything 
> I need to 
> >think about as it is a solicitation for input.  I'm most 
> interested in 
> >hearing about the implementation details from people familiar with 
> >Squeak's interpreter, and about the capilities this VM will provide 
> >from people that have implemented persistence frameworks.
> >
> >So, here are my thoughts:
> >
> >First, I would like call this VM the "Persistence VM" or something 
> >similar.  The idea being that it is for anyone needing incremental 
> >object persistence in Squeak.  The hope being that a lot of 
> persistence 
> >implementors could take advantage of what this vm offers.  
> It will have 
> >performance and space implications, so it should remain 
> separate from 
> >the main VM.
> 
> I'd really prefer for there to be a single VM that supports 
> persistence and also is reasonable for folks that don't use 
> persistence. Ideally, I think persistence should be something you 
> don't really have to think about, but is just there as part of the 
> toolset. I suspect you may be able to get by with fewer changes than 
> you've proposed. If so, a single VM may be fine for both.

I agree...there is nothing to prevent this VM from general use.

> I suggest starting with the minimum VM changes required, omitting 
> changes that are solely for performance. Later, some of the 
> performance changes can be experimented with, but at least we'll have 
> a baseline against which to measure any performance gain.
> 
> I'm assuming that you haven't yet implemented a persistence scheme 
> that doesn't use these VM changes and worked with it enough to see 
> where the performance bottlenecks are. If your suggested VM changes 
> are the result of studying the bottlenecks in an existing 
> implementation, then many of my suggestions do not apply.

Actually, I have...several times.  ;)

> [...]
> 
> >
> >What this VM will provide:
> >
> >1) the ability to set a flag for any object in the system that will 
> >prevent that object's state from being directly accessed (read or 
> >write)
> 
> I see the need to trap writes; you need to be able to tell when an 
> object has been dirtied. I don't see any need to trap reads, other 
> than LRU tracking. Checking a header bit on every instvar write will 
> slow things down a bit, but checking every read will have a larger 
> impact, since reads far outnumber writes. (If I recall correctly, 
> that is. I haven't measured the read/write ratio myself.)
> 
> I'd suggest starting by only trapping writes. A while back I posted a 
> proposal for this for the VM4 work. Throwing an exception on 
> attempted write to a flagged object should be sufficient, though I'd 
> need to review the details of Squeak's exception handling to be sure.

The LRU capabilities are important, but not the main reason I wan't to
trap reads.  The real reason is that I want to support multiple versions
of the same object in the same object memory at the same time.  I don't
know how bad the performance impact of a bit test will be in this case.
There is a possibility that I could swap out the bytecode
implementations (for the faster versions) when I don't need to trap
reads (i.e. when I'm not running in a transaction), but that's an added
complexity that I don't want to introduce until I know it's necessary.

> >2) the ability to associate a state manager object with any 
> object in 
> >the system
> 
> You need to be able to attach state to any object, but this shouldn't 
> require a VM change. A suitably optimized IdentityDictionary will do. 
> Adding VM support for a state manager reference in the header might 
> have better performance, but I wouldn't start with that. I don't like 
> adding VM complexity unless there are commensurate demonstrated gains.

Check out http://spair.swiki.net/26 ...it seems to perform pretty well.
But, the state manager is there to provide an object that is sent the
read/write notifications.

> >3) any attempt to read or write a slot in an object that has the 
> >"managed state" flag will result in a read or write message 
> being sent 
> >to that object's state manager (or to nil if none has been assigned)
> 
> I suspect that an exception on write is all that is needed here.

There is some cost to exception handling...I suppose you would need a
general "attempt to read/write" exception and a system wide default
handler for that exception...then you have to scan up the stack looking
for handlers...if there happen to be any, you have to jump back into
Smalltalk and test if that handler is for that exception or not.  Under
certain circumstances (i.e. where you are running under a general
handler of any Error) it would be a lot of overhead.  Alternatively, I
suppose that a single object in the system could be made special and
receive all read/write trap messages.

Eventually, I'll probably want the VM to be able to directly access
(read/write) alternate state for an object based on the active
process...but I want to prove this out in Smalltalk first.

> >4) if a state manager has, as its first slot, a 
> SmallInteger, a 4 bit 
> >portion of that small integer will be reset to 16r0 whenever 
> that state 
> >manager is sent the read or write message (persistence implementors 
> >could use this fact to age cached objects and implement an 
> approximate 
> >LRU cache management scheme)
> 
> I'd leave this out of early versions. There are other possible cache 
> management schemes, some of which are approximate LRU, that don't 
> require VM changes, and I'd want to see how those performed before 
> putting this into the VM.

Perhaps you're right...but, I know that a cache management scheme is
necessary and this one should work quite nicely.  I might also want to
set a dirty bit (on write operations) in this same SmallInteger (again
only if present).  Having a dirty bit would allow a simple and periodic
global scan and commit of all changed objects.  This isn't really a
suitable commit strategy for all types of applications, but it's very
handy during development.

> >5) the ability to transform any object in the system into a 
> "forwarder" 
> >which will transparently forward all messages it recieves to 
> some other 
> >designated object...garbage collection will automatically change all 
> >references to forwarders into references to their target.  A full GC 
> >will eliminate all references to all forwarder objects in memory.  A
> >Smalltalk>>allObjectsDo: will not visit "forwarder" objects 
> and once an
> >object is converted to a forwarder, it's state will no longer be 
> >accessible, nor will it be capable of receiving messages 
> (all message 
> >sent to it will go to it's target)...identity comparisons 
> will result 
> >in an identity comparison with the target object...the hash 
> will be the 
> >hash of the target object (thus if the original object is in 
> any hash 
> >sets, you might need to remove and re-insert it, or rehash the set).
> 
> I suggested something like this a while back, and I still suspect 
> that it might be a good idea for applications that need to do many 
> becomes. However, I'd rather see a persistence framework that doesn't 
> heavily use become.

I agree.  A stub will become a forwarder, and a managed object can be
made into a stub if the cache management system needs to.  I can't
really see any way to avoid using "become" (actually, I'll be using a
forwarder and mutating the state and headers respectively) in these two
cases.

> If the size of an object can be known at the point in time that a 
> stub needs to be created (and a reasonable faulting scheme usually 
> can be found that does give this knowledge) then the stub can be 
> created to be the correct size. Then all that is needed is the 
> ability to change the class of the object, and it can stay in the 
> same place in memory, so pointers to it would not need to be 
> corrected.
> 
> I don't remember if Squeak has the ability to change the class of an 
> object. If not, that would need to be added, but that's considerably 
> simpler than the forwarders.

Squeak has a limited ability to mutate objects in this way, but it's not
suitable for altering stubs.  I implemented exactly this scheme a while
back for yet another persistence framework I had and it work somewhat
well.  I just added a primitive for changing the class (which took care
of the headers as well)...the problem is that you can't assume that
you'll know the exact size of an object ahead of time.  Depending on
where you're pulling your objects, getting the size of a related object
may or may not be a viable option.  In the worst case, you'll be
fetching every object once to get the size for a stub to it, and once
for when you actually materialize it...which means you'll be fetching
every object at least twice.  You might then decide that since you're
fetching it to get the size, you may as well go ahead and marterialize
it, but then you have a cascading problem.  I even thought about ways of
storing the size of a related object with the pointer in the db, but
that too has it's problems.

Besides, I already implemented the forwarder capability in Squeak when I
was working on a LOOM style Squeak VM (which I temporarily stopped
working on...while a LOOM style VM would be great, it's of little use if
you need to persist your objects in some external system (like an
RDBMS)).  Forwarder are also useful 

> >6) the ability to transform any object in the system into a "stub" 
> >object.  A stub object does not understand very much and when it 
> >receives a message, will ask another object to load the real object 
> >into memory, transform itself into a "forwarder" object, and then 
> >resend the message to itself (after the stub object has been 
> >transformed into a
> >forwarder)
> 
> The in-place class change may be the fastest and simplest way 
> to do this.

But not general enough...see above.

> >Required VM changes:
> >
> >- compact classes will be dropped
> 
> This seems reasonable. A simpler header is a good thing, and the 
> space costs don't seem all that bad.
> 
> >- the ability to set an object's identity hash from Smalltalk is 
> >needed..
> 
> Agreed.
> 
> [...]

- Stephen