[squeak-dev] A few more arguments to instantiating object memory based on another one

Fri Aug 15 08:27:49 UTC 2008

2008/8/15 Joshua Gargus <schwa at fastmail.us>:
> (Wishing myself more success with writing this email before sending it :-) )
>
> Igor Stasenko wrote:
>>
>> Hi folks,
>>
>> soon Hydra will provide a support to instantiate new interpreter
>> instance from current object memory, e.g. not based on images which
>> residing on file system.
>>
>> The main focus for this feature is to create a tiny images, with
>> limited behavior for off-loading processing from main interpreter to
>> separate worker interpreter.
>> Since Hydra already has mechanisms to transfer data between
>> interpreters, the need in initially packing new image(s) with data is
>> minimal.
>> The most important (and interesting) IMO is to define and transfer a
>> behavior (classes & their methods) which is minimal for solving some
>> problem in its domain.
>>
>
> Not that it's immediately relevant, but keep in mind that we'll eventually
> want to be able to share behavior between images.  Possibly the most
> important thing is how much happier this will make the L2 caches on future
> multi-core chips.
>>
>> Now, why i think its more convenient than having separate images?
>>
>
> If you had to pick between one or the other, I can see how your argument
> makes sense.  However, I don't see why you can't trivially have both.
>  Furthermore, there are use-cases where the ability to load an image from
> disk (or from the network) might be useful.  More on both of these points
> below.

You can load & run image from disk. This feature is main one in
initial release of Hydra.

>>
>> First it is easier to support and distribute: you having a single
>> 'bloated' main image which carrying all necessary code & data and
>> don't need to build a bunch of small images and manage them in
>> distribution.
>>
>> Firing new interpreter instance through copying data from base heap to
>> new heap could be even faster than reading & running image from file,
>> because no disk i/o and all operations performed in memory.
>>
>
> This seems like an unfair comparison.  A better comparison would be
> comparing your method to running an image once it has already been loaded
> from a file (since, of course, you can store an image as a a ByteArray in
> memory just as easily as you can store your object-graph-array).

Hmm, why its unfair? the difference lies only in the place where VM
getting info for creating new instance of interpreter:
a) by loading image from disk
b) by cloning provided set of objects to new heap from existing one.

the rest of operations for creating and initializing new interpreter
instance is same.

Of course, producing new image is process based on some heuristics
using base image, this could take much more time, of course. But as i
said, it could be compensated by keeping precalculated data in base
image.
The extra memory requirements for two arrays (which representing new
image) can't be compared with full object memory snapshot which you
need to keep separately on disk.

>>
>> A primitive, which doing copy & run takes two arguments: an array of
>> object refs to be cloned into new heap and array of stubs in a form of
>> pairs (oop + index of oop in first array which will replace reference
>> to original oop).
>> Before doing anything, the primitive check if given arguments forming
>> a closed object memory graph e.g there is no references outside of it.
>>
>> These two arrays can be pre-generated and sit in base image, so you
>> may have different sets of precalculated graphs for different needs
>> and then simply spawn new interpreter(s) at system startup.
>> Also, as far as you controlling development & distribution cycle, you
>> can keep such arrays within image and recalculate them when it needs
>> to.
>> And you can always include any mechanisms for error handling in
>> mini-images which could tell if anything goes wrong (like handling
>> unknown messages, catching bugs etc).
>>
>> Also, i'm looking forward for integration with Spoon main feature -
>> behavior imprinting, when consumer image asks provider image to
>> deliver behavior required to run some code.
>>
>>
>
> The technical details of your approach sound good to me (without having
> thought deeply enough to provide truly constructive criticism).  However...
>
> My main concern is that your argument against separate images is
> disingenuous.  They won't be slower if you store them as ByteArrays within
> the main image.  In fact, I believe that the opposite would be true; don't
> you agree?  From a performance standpoint, it seems like separate images are
> the better option.

Well, if you want to do a real-time spawn  & kill dozens interpreters,
then i need to disappoint you:
for initializing new interpreter instance there a lot of things
besides loading new image in memory which could make this process
really slow. First of all - this is initialization of plugins &
interpreter states.
Of couse, this could be improved by postponing plugin initialization
up to point where it really needed, but i think it will be hard to do
with current VMMaker design.

>
> Separate images allow (security implementations aside) nifty things like
> mobile code... I can download an image from a server or a P2P network and
> run it in my image.  I don't yet know what I would do with this ability, but
> as we ge more experience with the object-capability security model (hello
> Newspeak!) I'm sure that there will be no shortage of good ideas.
>

Surely, one could use a bytearray to instantiate new image.
Even now, you can just write new image to temp file first and then
instantiate new interpreter from that image.
And then, later we can add a primitive which could simply take new
image from bytearray.
This could be useful, but not very valuable to my thinking, since its
not adding anything new in the ways how new images could be produced.

Also, don't forget about possible future use cases, when we possible
meet with model how to support cross-heap references by making images
interconnected with each other using far referencing.
With this hypothetical model, you will not need to form a closed graph
of objects, you just define a set of objects which will be cloned into
new heap, while rest references will be threated by VM as far
references to base heap.

> Of course, these separate images need to be built somehow, and it seems to
> me that this is where your ideas fit in (for development more than
> deployment).
>

Yes, what i actually proposing is the way how you can build image &
run it without doing any file/stream based i/o, also conserving memory
space by reusing/copying already existing objects in original image.

> Cheers,
> Josh
>

-- 
Best regards,
Igor Stasenko AKA sig.