[Vm-dev] Re: [Pharo-project] Integrating Changes in 1.4 that require a new VM

Henrik Sperre Johansen henrik.s.johansen at veloxit.no
Fri Sep 23 12:01:19 UTC 2011


On 22.09.2011 21:57, Eliot Miranda wrote:
>   
>
>
>
>
> On Thu, Sep 22, 2011 at 12:29 PM, Henrik Sperre Johansen 
> <henrik.s.johansen at veloxit.no <mailto:henrik.s.johansen at veloxit.no>> 
> wrote:
>
>
>     On 22.09.2011 20:20, Eliot Miranda wrote:
>>       
>>
>>
>>
>>
>>     On Thu, Sep 22, 2011 at 11:06 AM, Eliot Miranda
>>     <eliot.miranda at gmail.com <mailto:eliot.miranda at gmail.com>> wrote:
>>
>>         Hi Igor,
>>
>>         On Thu, Sep 22, 2011 at 10:53 AM, Igor Stasenko
>>         <siguctua at gmail.com <mailto:siguctua at gmail.com>> wrote:
>>
>>             On 22 September 2011 19:16, Eliot Miranda
>>             <eliot.miranda at gmail.com
>>             <mailto:eliot.miranda at gmail.com>> wrote:
>>             > (apologies for the duplicate reply; someone needs to
>>             sort out their
>>             > threading for the benefit of the community ;) )
>>             >
>>             > On Thu, Sep 22, 2011 at 2:36 AM, Marcus Denker
>>             <marcus.denker at inria.fr <mailto:marcus.denker at inria.fr>>
>>             > wrote:
>>             >>
>>             >> Hi,
>>             >>
>>             >> There are two changesets waiting for integrating in
>>             1.4 that have serious
>>             >> consequences:
>>             >>
>>             >> - Ephemerons. The VM level changes are in the Cog VMs
>>             build on Jenkins,
>>             >> but have not
>>             >>  been integrated in the VMMaker codebase.
>>             >>
>>             >> http://code.google.com/p/pharo/issues/detail?id=4265
>>             >
>>             > I would *really* like to back out these changes.  The
>>             Ephemeron
>>             > implementation is very much a prototype, requiring a
>>             hack to determine
>>             > whether an object is an ephemeron (the presence of a
>>              marker class in the
>>             > first inst var) that I'm not at all happy with.  There
>>             is a neater
>>             > implementation available via using an unused instSpec
>>             which IMO has
>>             > significant advantages (much simpler & faster, instSpec
>>             is valid at all
>>             > times, including during compaction, less overhead,
>>             doesn't require a marker
>>             > class), and is the route I'm taking with the new
>>             GC/object-representation
>>             > I'm working on now.  Note that other than determining
>>             whether an object is
>>             > an ephemeron (instSpec/format vs inst var test) the
>>             rest of Igor's code
>>             > remains the same.  I'd like to avoid too much VM
>>             forking.  Would you all
>>             > consider putting these changes on hold for now?
>>             > If so, I'll make the effort to produce prototype
>>             changes (in the area of
>>             > ClassBuilder and class definition; no VM code necessary
>>             as yet) to allow
>>             > defining Ephemerons via the int spec route by next week
>>             at the latest.
>>             >
>>
>>             i agree that in my implementation this is a weak point.
>>             But its hard
>>             to do anything without
>>             making changes to object format to identify these special
>>             objects.
>>
>>             The main story behind this is can we afford to change the
>>             internals of
>>             VM without being beaten hard
>>             by "backwards compatibility" party? :)
>>
>>
>>         I don't think we get stuck in this at all.  The
>>         instSpec/format field has an unused value (5 i believe) and
>>         this can easily be used for Ephemerons. All that is needed is
>>         a little image work on these methods:
>>
>>             Behavior>>typeOfClass
>>                 needs to answer e.g. #ephemeron for ephemeron classes
>>
>>             ClassBuilder>>computeFormat:instSize:forSuper:ccIndex:
>>                 needs to accept e.g. #ephemeron for type and pass
>>         variable: false and weak: true for ephemerons
>>         to format:variable:words:pointers:weak:.
>>
>>             ClassBuilder>>format:variable:words:pointers:weak:
>>                 needs to respond to variable: false and weak: true by
>>         computing the ephemeron instSpec.
>>
>>            
>>         Class>>weakSubclass:instanceVariableNames:classVariableNames:poolDictionaries:category:
>>            
>>         ClassBuilder>>superclass:weakSubclass:instanceVariableNames:classVariableNames:poolDictionaries:category:
>>                 need siblings, e.g.
>>                    
>>         ephemeronSubclass:instanceVariableNames:classVariableNames:poolDictionaries:category
>>
>>                    
>>         superclass:ephemeronSubclass:instanceVariableNames:classVariableNames:poolDictionaries:category:
>>
>>         Right?  This is easy.  Then in the VM there are a few places
>>         where pointer indexability (formats 3 and 4) need to be
>>         firmed up to exclude 5, but nothing difficult.  We talked
>>         about this in email last week.
>>
>>
>>     Here's the format field (Behavior>instSpec at the image level) as
>>     currently populated:
>>       0 = 0 sized objects (UndefinedObject True False et al)
>>       1 = non-indexable objects with inst vars (Point et al)
>>       2 = indexable objects with no inst vars (Array et al)
>>       3 = indexable objects with inst vars (MethodContext
>>     AdditionalMethodState et al)
>>       4 = weak indexable objects with inst vars (WeakArray et al)
>>       6 = 32-bit indexable objects (Float, Bitmap ert al)
>>       8 = 8-bit indexable objects (ByteString, ByteArray et al)
>>     12 = CompiledMethod
>>
>>     N.B. in the VM the least two bits of the format/instSpec for byte
>>     objects (formats 8 and 12) is used to encode the number of odd
>>     bytes in the object, so that a 1 character ByteString has a
>>     format of 11, = 8 + 3, size = 1 word - 3 bytes.
>>
>>
>>     For the future (i.e. the new GC/object representation, /not/ for
>>     the first implementation of ephemerons which we can do now, for
>>     Pharo 1.4 or 1.5) we need to extend format/instSpec to support 64
>>     bits.  I think format needs to be made a 5 bit field with room
>>     for 4 bits of odd bytes for 64-bit images.  [For VMers, the
>>     Size4Bit is a horrible hack).  So then
>>
>>     0 = 0 sized objects (UndefinedObject True False et al)
>>     1 = non-indexable objects with inst vars (Point et al)
>>     2 = indexable objects with no inst vars (Array et al)
>>     3 = indexable objects with inst vars (MethodContext
>>     AdditionalMethodState et al)
>>     4 = weak indexable objects with inst vars (WeakArray et al)
>>     5 = weak non-indexable objects with inst vars (ephemerons)
>>     (Ephemeron)
>>
>>     and we need 8 CompiledMethod values, 8 byte values, 4 16-bit
>>     values, 2 32-bit values and a 64-bit value, = 23 values, 23 + 5 =
>>     30, so there is room, e.g.
>>
>>     9 (?) 64-bit indexable
>>     10 - 11 32-bit indexable
>>     12 - 15 16-bit indexable
>>     16 - 23 byte indexable
>>     24 - 31 compiled method
>>
>>     In 32-bit images only the least significant 2 bits would be used
>>     for formats 16 & 24, and the least significant bit for format 12.
>     If we are changing the format for 64bit images anyways, why not
>     simplify it/ be more consistent by spending a full byte?
>
>     Bit: 8           7           6           5           
>     4               3               2                 1
>     | 64bit  | 32bit |16bit  | 8bit |compiled | weak | indexable  |
>     instVars  |
>     (Odd number encoded in remaining indexable bit fields)
>
>
> I used to prefer this approach but I've realised that the 
> format/instSpec approach (I think Dan came up with) makes better use 
> of bits because so many of the bit combinations are mutually 
> exclusive.  For example, pointers excludes all the 
> byte/short/32-bit/64-bit indexability combinations.  Also, see below...
>
>
>     Could get away with 7 if you put f.ex. the unused indexable weak
>     combination (6) as compiled method/8bit
>
>     Or is the header space in your new 64bit format already quite
>     filled, so this is a bad idea?
>
>
> Yes, ish.  But they're scarce, and very useful for experiments etc. 
>  Right now I have
>
> typedef struct {
> unsigned shortclassIndex;
> unsignedunused0 : 6;
> unsignedisPinned : 1;
> unsignedisImmutable : 1;
> unsignedformat : 5;               /* on a byte boundary */
> unsignedisMarked : 1;
> unsignedisGrey : 1;
> unsignedisRemembered : 1;
> unsignedobjHash : 24;          /* on a 32-bit word boundary */
> unsigned charslotSize;                /* on a byte boundary */
>  } CogObjectHeader;
>
> Where classIndex is 16-bits simply for efficiency and will grow to 20 
> or 22 bits as needed.  So one could steal one or two bits from unused0 
> and two bits from objHash, and give these to format, but it would be a 
> waste.  Better keep these back for other uses.
>
> Also, can I ask the assembled company exactly how many bits you'd 
> spend on the objHash (identityHash)?  Think forward to 64-bits.  Is 24 
> bits about all we can afford or still too generous?  Anybody have any 
> data to contribute?
This is probably a stupid question, but where is the variable size in 
words stored?
In a quadword preceding the header like it is in 32bit format?

The reason I'm asking is that to me, the main application of 
identityHash is for HashedCollection's, and the max size of those thus 
impact what a reasonable answer would be...

Cheers,
Henry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20110923/3a3111e4/attachment.htm


More information about the Vm-dev mailing list