[squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Fri Aug 15 08:53:08 UTC 2008

On Fri, 15 Aug 2008 09:11:21 +0200, Joshua Gargus wrote:

> (Wishing myself more success with writing this email before sending it  
> :-) )
>
> Igor Stasenko wrote:
>> Hi folks,
>>
>> soon Hydra will provide a support to instantiate new interpreter
>> instance from current object memory, e.g. not based on images which
>> residing on file system.
>>
>> The main focus for this feature is to create a tiny images, with
>> limited behavior for off-loading processing from main interpreter to
>> separate worker interpreter.
>> Since Hydra already has mechanisms to transfer data between
>> interpreters, the need in initially packing new image(s) with data is
>> minimal.
>> The most important (and interesting) IMO is to define and transfer a
>> behavior (classes & their methods) which is minimal for solving some
>> problem in its domain.
>>
> Not that it's immediately relevant, but keep in mind that we'll  
> eventually want to be able to share behavior between images.

This is my part of Igor's enterprise ;) we discuss my crazy cross-heap  
pointerage approaches with Igor as sparring partner ;)

The main attention got GC, which has (among others) these aspects:

o distributing object allocation is very promising
  (Guillermo Adrián Molina: distributing alloc is the main
   application of parallel processing [native threads in
   Huemul], email communication)

o 90% of each Process (remember 1:1 to a thread) references
   are in the same chunk of process generated objects, 9% of
   the references are to globals, and 1% to other process
   generated objects (Guillermo Adrián Molina, native threads
   in Huemul, email communication)

o local GC gets often in the way when doing things in parallel

o garbage can by come cyclic cross-heap and so unreclaimable

> Possibly the most important thing is how much happier this will make the  
> L2 caches on future multi-core chips.

:)

>> Now, why i think its more convenient than having separate images?
>>
> If you had to pick between one or the other, I can see how your argument  
> makes sense.  However, I don't see why you can't trivially have both.   
> Furthermore, there are use-cases where the ability to load an image from  
> disk (or from the network) might be useful.  More on both of these  
> points below.
>> First it is easier to support and distribute: you having a single
>> 'bloated' main image which carrying all necessary code & data and
>> don't need to build a bunch of small images and manage them in
>> distribution.
>>
>> Firing new interpreter instance through copying data from base heap to
>> new heap could be even faster than reading & running image from file,
>> because no disk i/o and all operations performed in memory.
>>
> This seems like an unfair comparison.  A better comparison would be  
> comparing your method to running an image once it has already been  
> loaded from a file (since, of course, you can store an image as a a  
> ByteArray in memory just as easily as you can store your  
> object-graph-array).

The focus here is, what is needed for a new parallel computational task to  
be offloaded *on*the*fly* for running in another thread+heap; if one needs  
a harddisk for that, that part must be purchased and installed and  
formatted and populated (joking ;)

>> A primitive, which doing copy & run takes two arguments: an array of
>> object refs to be cloned into new heap and array of stubs in a form of
>> pairs (oop + index of oop in first array which will replace reference
>> to original oop).
>> Before doing anything, the primitive check if given arguments forming
>> a closed object memory graph e.g there is no references outside of it.
>>
>> These two arrays can be pre-generated and sit in base image, so you
>> may have different sets of precalculated graphs for different needs
>> and then simply spawn new interpreter(s) at system startup.
>> Also, as far as you controlling development & distribution cycle, you
>> can keep such arrays within image and recalculate them when it needs
>> to.
>> And you can always include any mechanisms for error handling in
>> mini-images which could tell if anything goes wrong (like handling
>> unknown messages, catching bugs etc).
>>
>> Also, i'm looking forward for integration with Spoon main feature -
>> behavior imprinting, when consumer image asks provider image to
>> deliver behavior required to run some code.
>>
>>
> The technical details of your approach sound good to me (without having  
> thought deeply enough to provide truly constructive criticism).   
> However...
>
> My main concern is that your argument against separate images is  
> disingenuous.  They won't be slower if you store them as ByteArrays  
> within the main image.

But then they are always in the way when GC comes around :( This would  
invalidate all the pointers of the parallel thread and require global  
synchronization :(

Not a good idea :( we want things to run in parallel independent of each  
other's GC.

> In fact, I believe that the opposite would be true; don't you agree?   
> From a performance standpoint, it seems like separate images are the  
> better option.

When creation of bytearray versus creation of separate heap can be  
ignored, there would be no difference in terms of performance (it's all  
oops all the way down, anyways). Only that bytearrays are not usable for  
parallel processing.

> Separate images allow (security implementations aside) nifty things like  
> mobile code...

Yes, sure, every thread+heap in Hydra represents an .image that can be  
snaphotted and used however you like it. There can even be .images,  
created in the way that Igor discribes, which won't need the HydraVM  
power, just the stock Squeak VM power.

> I can download an image from a server or a P2P network and run it in my  
> image.  I don't yet know what I would do with this ability, but as we ge  
> more experience with the object-capability security model (hello  
> Newspeak!) I'm sure that there will be no shortage of good ideas.
>
> Of course, these separate images need to be built somehow, and it seems  
> to me that this is where your ideas fit in (for development more than  
> deployment).

No, there's no limit for deployment, it depends on what application the  
separate .image contains. Some will require HydraVM power when deployed,  
others not.

/Klaus

> Cheers,
> Josh
>
>
>