[Vm-dev] A Smalltalk object database idea

Louis LaBrunda Lou at Keystone-Software.com
Tue Dec 29 15:10:11 UTC 2009


Hello Squeak VM Guys,

My name is Louis LaBrunda.  I use Instantiations VA Smalltalk but dabble
with Squeak from time to time.

I have an outside-the-box way of implementing an object database for
Smalltalk that I would like to see if there is anyone here who is
interested in implementing.  I understand the theory behind Smalltalk VMs
(at least I think I do) but would require a large learning curve to
actually modify one.  This idea doesn't require the inventing or improving
of any technology but it does require changes to the VM.

For the purpose of describing this idea, I will deal with only one database
and not go into binding to the database and other details like transaction
processing and such.  These things are of course important but I think they
can be handled in very much standard ways that should not be changed by
this means of implementing the object database.

The idea is that the VM would treat the database file much like a CPU chip
would treat RAM and would treat its (the VM) memory like a CPU chip would
treat its internal (on-chip) cache.  There would be a similar means of
linking the data in memory to the data in the database as there is between
linking a CPU chip's cache and RAM.

A I said, I'm not very knowledgeable of the internal working of Smalltalk
VMs, so much of what I am about to say is guess work but I think it is
accurate.  Objects represented in the memory of a Smalltalk VM probably
take up about 12 bytes or so for 32 bit systems, more for 64 bit systems.
Much of these bytes are bits that define the class.  Some of the bytes
might be the value of the object if it is say a small integer or a byte or
character.  If the data (value) of the object is larger than will fit in a
few bytes, there is a pointer to the data.  If the object has instance
variables that are of course other objects, there are pointers to them.

A bit would be needed to indicate a persisted object and probably another
bit to indicate the object is dirty (changed and therefore doesn't match
the database file copy).  Objects with the persisted bit off would
otherwise look and be treated the same as they are now.  Objects with the
persisted bit on would have all their pointers replaced with offsets from
the beginning of the database file (a single file containing all the
persisted objects.  All objects pointed to by a persisted object must also
be persisted objects.

When the VM comes across a persisted object it would use the pointers (that
are now offsets within the database file) as keys into a lookup table (hash
table) to find the real pointer to the data in memory.  If the item is
found in the lookup table the value is used as it would have been if it was
in the object and all is the same.  If the item is not found in the lookup
table the offset into the database file is used to read the object from the
database.  The lookup table would then be updated to include the new item.

As far as I can tell the copies of the object in memory and in the database
file can be identical (no object dumper/loader serialization).  There may
need to be a little bit of a wrapper in the database file but I don't think
much.  This should make for a very quick loading and saving of objects.

Probably some objects, like blocks of code can't or shouldn't be saved to
the database (I'm not sure if this is true for Squeak).  But I don't think
that is any different than systems that use object dumper/loader
serialization.

I think a low priority fork could run through the lookup table for objects
with the dirty bit set and save them to the database file.  A #persist (or
some other good name) method could be added to #Object to force the saving
of an object to the database.  This would probably be implemented with a
primitive but maybe not.

There may be some changes needed for garbage collection to keep the lookup
table up to date but I don't think that will be a big deal.  Hopefully
garbage collection for the database file could be handled mostly by
Smalltalk code with the help of a few primitives.

Well, that's it for now.  I hope this has been an interesting read and not
a waste of your time.  If you think the idea has merit, let me know and we
can discuss it further.

Thank you very much for your time.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:Lou at Keystone-Software.com http://www.Keystone-Software.com



More information about the Vm-dev mailing list